Role of Effect Size in Power Analysis

The term "effect size" refers to the magnitude of the effect under the alternate hypothesis. The nature of the effect size will vary from one statistical procedure to the next (it could be the difference in cure rates, or a standardized mean difference, or a correlation coefficient) but its function in power analysis is the same in all procedures.

The effect size should represent the smallest effect that would be of clinical or substantive significance, and for this reason it will vary from one study to the next. In clinical trials for example, the selection of an effect size might take account of the severity of the illness being treated (a treatment effect that reduces mortality by one percent might be clinically important while a treatment effect that reduces transient asthma by 20% may be of little interest). It might take account of the existence of alternate treatments (if alternate treatments exist, a new treatment would need to surpass these other treatments to be important). It might also take account of the treatment's cost and side effects (a treatment that carried these burdens would be adopted only if the treatment effect was very substantial).

Power analysis gives power for a specific effect size. For example, the researcher might report "If the treatment increases the recovery rate by 20 percentage points the study will have power of 80% to yield a significant effect". For the same sample size and alpha, if the treatment effect is less than 20 points then power will be less than 80%. If the true effect size exceeds 20 points, then power will exceed 80%.

Power as Function of Effect Size and N - Two sample proportions While one might be tempted to set the "clinically significant effect" at a small value to ensure high power for even a small effect, this determination cannot be made in isolation. The selection of an effect size reflects the need for balance between the size of the effect that we can detect, and the resources available for the study.

Small effects will require a larger investment of resources than large effects. Figure 1 shows power as a function of sample size for three levels of effect size (assuming alpha, 2-tailed, is set at .05). For the smallest effect (30% vs. 40%) we would need a sample of 356 per group to yield power of 80%. For the intermediate effect (30% vs. 50%) we would need a sample of 93 per group to yield this level of power. For the highest effect size (30% vs. 60%) we would need a sample of 42 per group to yield power of 80%. We may decide that it would make sense to enroll 93 per group to detect the intermediate effect but inappropriate to enroll 356 patients per group to detect the smallest effect.

The "true" (population) effect size is not known. While the effect size in the power analysis is assumed to reflect the population effect size for the purpose of calculations, the power analysis is more appropriately expressed as "If the true effect is this large power would be ... " rather than "The true effect is this large, and therefore power is ..."

This distinction is an important one. Researchers sometimes assume that a power analysis cannot be performed in the absence of pilot data. In fact, it is usually possible to perform a power analysis based entirely on a logical assessment of what constitutes a clinically (or theoretically) important effect. Indeed, while the effect observed in prior studies might help to provide an estimate of the true effect it is not likely to be the true effect in the population - if we knew that the effect size in these studies was accurate, there would be no need to run the new study.

Since the effect size used in power analysis is not the "true" population value, the researcher may elect to present a range of power estimates. For example (assuming N=93 per group and alpha=.05, 2 tailed), "The study will have power of 80% to detect a treatment effect of 20 points (30% vs. 50%), and power of 99% to detect a treatment effect of 30 points (30% vs. 50%)".

Cohen has suggested "conventional" values for "small", "medium" and "large" effects in the social sciences. The researcher may want to use these values as a kind of reality-check, to ensure that the values he/she has specified make sense relative to these anchors. The program also allows the user to work directly with one of the conventional values rather than specifying an effect size, but it is preferable to specify an effect based on the criteria outlined above, rather than relying on conventions.

Previous | Next

		Home · What's Power Analysis? · Effect Size · In Context · Precision Analysis Software Specifications · Product Tour · Site Map