Hypothesis testing

Last revised by Stefan Tigges on 3 Jan 2024

Citation, DOI, disclosures and article data

Citation:

Tigges S, Feger J, Hypothesis testing. Reference article, Radiopaedia.org (Accessed on 30 Jun 2024) https://doi.org/10.53347/rID-180060

DOI:

https://doi.org/10.53347/rID-180060

Permalink:

https://radiopaedia.org/articles/180060

rID:

180060

Article created:

5 Dec 2023, Stefan Tigges ◉

Disclosures:

At the time the article was created Stefan Tigges had no financial relationships to ineligible companies to disclose.

View Stefan Tigges's current disclosures

Last revised:

3 Jan 2024, Stefan Tigges ◉

Disclosures:

At the time the article was last revised Stefan Tigges had no financial relationships to ineligible companies to disclose.

View Stefan Tigges's current disclosures

Revisions:

7 times, by 2 contributors - see full revision history and disclosures

Tags:

statistics, research

Hypothesis testing is a statistical method used to evaluate clinical trial results and consists of four consecutive steps:

specification of the null hypothesis and the alternative hypothesis
data collection
statistics and p-value calculation
rejection or failure to reject the null hypothesis

The null hypothesis (H0) is that there is no difference between groups being evaluated, while the alternative hypothesis (HA) is that there is a difference between groups. For example, in the National Lung Screening Trial (NLST), H0 is that there was no difference in lung cancer mortality between subjects screened for lung cancer with low dose CT and those screened with chest x-ray while HA is that there was a mortality difference. Hypothesis testing evaluates the plausibility of the null hypothesis in light of the data gathered in the trial.

Data collection

Carry out the clinical trial and gather data in a way suitable to test the hypothesis.

P-value calculation

Calculate a p-value. The p-value is the probability of getting a result at least as extreme as the one observed in the trial. The p-value is a conditional probability (P(data observed|H0 true)). Since the p-value is calculated assuming that the null hypothesis is true, the "expected" trial result is that there is no difference between groups. Because of random sampling error, small differences between identical groups do occur.

Null hypothesis rejection

Compare the p-value with alpha (α), the predetermined level at which we "reject" the null. Alpha is usually set at 0.05 or 5%. This means that if our p-value is <0.05, we conclude that H0 is implausible and we reject the null. Remember, under the assumption that the null hypothesis was true, we expected no difference between groups: if the probability of seeing a difference as large as the one we saw is small, perhaps the null hypothesis is incorrect. In the NLST for example, there were 20% fewer lung cancer deaths in the low-dose CT group compared to the chest X-ray group, resulting in a p-value of 0.004. If the p-value is >0.05, we conclude that we have insufficient evidence to reject the null. Because the p-value is calculated assuming H0 is true, it is incorrect to "accept" the null if p is >0.05.

Problems with hypothesis testing

Hypothesis testing is a nearly universal feature of articles in the medical literature but is unsatisfactory for multiple reasons.

the p-value gives no information regarding the size of an effect
the p-value gives no information regarding the variability of an effect. For this reason, confidence limits are often used instead of or in addition to p-values
p-values don’t tell us about what we’re actually interested in. We want to know how likely the alternative hypothesis is given the data that we observed, but p-values tell us how likely our observation is assuming that the null hypothesis is true
statistical significance (p-value < 0.05) is not the same as clinical significance
p-values cannot account for the effects of non-random error (bias)
overreliance on the p-value to determine the plausibility of the null hypothesis ignores the pre-experiment likelihood that H0 or HA are true. For example, if one were to perform 100 experiments on the ability of individuals to perform a psychic feat, the likelihood of at least one of these experiments showing an effect due to random error is high
rejecting H0 is not the same as accepting HA
p-values do not address the probability of making a beta error i.e. a false negative clinical trial result