The power of a clinical trial is the probability that the trial will find a difference between groups if there is one. Power can be defined as the probability of a true positive trial result and is often written as:
power = (1 - β)
where β is the probability of missing a difference between groups (a false negative result).
Most clinical trials aim for a power of 80% or an 80% chance of finding a difference if there is one. Using the definition of power above, many clinical trials have a β of 20% or a 20% chance of missing a real difference, i.e., having a false negative trial result.
For Radiologists, a useful analogy is that power is like sensitivity: both power and sensitivity are measures of the likelihood of finding something, a difference between groups in the case of power and disease in the case of a diagnostic test.
A variety of factors determine the power of a study including the size and variability of the effect under study, the level of alpha (α), and the sample size.
On this page:
Effect size
The larger the effect under study, the easier it is to recognize, and the higher the power. For example, the effect of a new chemotherapeutic agent that results in a 100% cure rate of a previously incurable cancer would be easy to identify.
Effect variability
The less variable the effect under study is, the easier it is to recognize, and the higher the power. For example, the effect of a new weight loss agent that resulted in a uniform loss of 10 kilograms among all study participants would be easier to identify than one that resulted in a range of weight loss with a high standard deviation. For Radiologists, a useful analogy is the signal-to-noise ratio: the noisier the data, the less likely one is to recognize the signal.
Level of α
The effect of the level of α is the least intuitive factor affecting power. Recall that α is the pre-determined threshold for rejecting the null hypothesis (H0) that there is no difference between groups, customarily set at 0.05. If the p-value of a study is below 0.05, then H0 is rejected; if the p-value of a study is above 0.05, the conclusion is that there is insufficient evidence to reject H0. If power is analogous to sensitivity, then α is analogous to 1-specificity or the false positive rate. By setting α at 0.05, we accept a false positive rate of 5%. If we decrease α to 0.01, we decrease our false positive rate to 1%, but in doing so, we increase the β or false negative rate, decreasing power. A decrease in α means that we are less likely to reject H0. This protects us from false positive results but will increase the number of times we fail to reject H0 when it is incorrect, decreasing the number of true positives. Increasing α makes it easier to reject H0, increasing the number of false positives and increasing the number of true positives and thus power.
Consider the 2 extremes. If α=0, H0 will never be rejected and all study results will be negative: there will be no false positives (good) but also no true positives (bad), resulting in a power of 0. If α=1, H0 will always be rejected and all study results will be positive: the only possible outcomes will be false positives (bad) and true positives (good), resulting in a power of 1. Increasing the number of negative results increases both true negatives and false negatives while increasing the number of positive results increases both true positives and false positives. More false negatives decrease power, and more true positives increase power.
A more intuitive analogy compares the level of α to the amount of evidence required to convict a defendant at trial. The lower the α, the more evidence is needed to convict, resulting in fewer false positive convictions but also fewer true positive convictions, decreasing power. The higher the α, the less evidence is needed to convict, resulting in more false positive convictions and more true positive convictions, increasing power.
Sample size
Power increases as sample size increases. Because sample size is in the denominator in the formula for standard deviation, increasing the sample size decreases the standard deviation which decreases variability. Because effect size and variability are dependent on biology and α is almost universally set at 0.05, the sample size is the easiest clinical trial parameter for investigators to control.
Another analogy makes the effect of sample size easier to grasp. A coin that landed heads on 3 straight consecutive tosses would be unlikely to raise suspicion of an unfair coin, but 30 or even 300 heads in a row is convincing evidence of the coin’s unfairness.
Caveat
It’s important to remember that this discussion of power only considers the influence of random factors and excludes consideration of other factors that influence clinical trial results such as bias.