
Significance
Testing vs Effect Size Estimation
The two approaches
outlined here - testing the null hypothesis of no effect and estimating
the size of the effect - are closely connected. A study that yields a
p-value of precisely .05 will yield a 95% confidence interval that begins
(or ends) precisely at zero. A study that yields a p-value of precisely
.01 will yield a 99% confidence interval that begins (or ends) precisely
at zero. In this sense, reporting an effect size with corresponding confidence
intervals can serve as a surrogate for tests of significance (if the confidence
interval does not include the nil effect, the study is statistically significant)
with the effect size approach focusing attention on the relevant issue.
However, by shifting the focus of a report away from significance tests
and toward the effect size estimate we ensure a number of important advantages.
First, effect
size focuses attention on the key issue. Usually, researchers and clinicians
care about the size of the effect and the issue of whether or not the
effect is nil is of relatively minor interest. For example, the clinician
might recommend a drug, despite its potential for side effects, if he
felt comfortable that it increased remission rate by some specific amount
such as 20% or 30% or 40%. Merely knowing that it increased the rate by
some amount exceeding zero is of little import. The effect size with confidence
intervals focuses attention on the key index (how large is the effect)
while providing likely boundaries for the lower and upper limits of the
true effect size in the population.
Second, the
focus on effect size rather than statistical significance helps the researcher
and the reader to avoid some mistakes that are common (indeed ubiquitous)
in the interpretation of significance tests. Since researchers primarily
care about the size of the effect (and not whether or not the effect is
nil) they tend to interpret the results of a significance test as though
these results were an indication of effect size. For example, a p-value
of .001 is assumed to reflect a large effect while a p-value of .05 is
assumed to reflect a moderate effect. This is inappropriate because the
p-value is a function of sample size as well as effect size. Often, the
non-significant p-value is assumed to indicate that the treatment has
been proven ineffective. In fact, a non-significant p-value could reflect
the fact that the treatment is not effective but could just as easily
reflect the fact that the study was under-powered.
If power
analysis is the logical precursor to a study that will test the null hypothesis,
then precision analysis is the logical precursor to a study that will
be used to estimate the size of a treatment effect. This program allows
the researcher to take account of both.
|