BMR 617: Statistical Techniques for the Biomedical Sciences

Multiple Hypothesis Testing

In the section on ANOVA, we discussed the idea that performing multiple pairwise tests (e.g. T-tests) between pairs of data groups would increase our chances of a false positive. In that context, we used specific Post-Hoc Tests to analyze the data in a way that still controlled the chances of a false postive. In this section we'll look at the idea that performing multiple hypothesis tests increases the chances of a false positive in a more general context, and look at ways to remedy the problem.

Recap on p-values

To understand this section, it's important to make sure we understand the definition and meaning of a p-value. If you need, review the section on hypothesis testing. Then answer the following questions on p-values.

  1. Which of the following is a correct description of a p-value?
    1. The probability that the null hypothesis is true
    2. The probability that the null hypothesis is false
    3. The probability that, if the null hypothesis were true, you would see an association in your data at least as strong as the one you actually observed
    4. The actual p-value is unimportant; all that matters is if it is below 0.05. If it's below 0.05, the null hypothesis is false, and if it's above 0.05, the null hypothesis is true.
    5. The probability the data arose from random chance.
    Incorrect
    Review the section on hypothesis testing. Remember there is no way to know the probability the null hypothesis is true (or false), and that there is nothing particularly special (other than convention) about the value 0.05.
    Correct!
    Every hypothesis test computes the p-value by first assuming the null hypothesis is true, and then calculating how likely it would be to see data as extreme as data in the observed data set.
  2. A misleading public media article incorrectly indicates that people born on Tuesdays have a higher chance of suffering from cancer at some point in their lives, and gains widespread publicity. As a result, many researchers conduct studies which look at historical medical records, extracting the birth date of patients and whether or not they suffered from cancer at some point. Each of these many studies are conducted independently (i.e. with a different set of patient records) and compute a p-value for the null hypothesis that people born on Tuesdays are equally as likely to suffer from cancer as people born on other days of the week.

    Considering all these p-values together, and making the (reasonable) assumption that the null hypothesis here is true - i.e. that there is no relationship between being born on a Tuesday and suffering from cancer - what would these p-values look like?

    1. All of the p-values would be exactly equal to 1.
    2. The p-values would be spread out, but they would be skewed so that most of them would be close to 1.
    3. The p-values would be evenly spread out, but none of them would be below 0.05.
    4. The p-values would be evenly spread out between 0 and 1.
    Incorrect
    Remember the correct definition of a p-value. What proportion of the data sets generated, assuming the null hypothesis is true, would produce a p-value less than 0.5? Less than 0.1? Less than 0.05?
    Correct!
    Assuming the null hypothesis is true (which is a very reasonable assumption in this scenario), the p-value is the probability that we would find a data set that gave an association between being born on a Tuesday and suffering from cancer. For example, 50% of the time a data set would have a more extreme association than one that yielded \(p=0.5\), so 50% of the data sets would have \(p<0.5\). Similarly, 40% of the data sets would have \(p<0.4\), and 30% of the p-values would be less than 0.3, etc. The p-values would be evenly spread out between 0 and 1. In particular, on average, 5% of the p-values would be less than 0.05.
Many people find the answer to the second question quite surprising. We'll reinforce this idea in the next section by simulating this exact scenario. While this fictional example might seem like an unlikely scenario, there are strong parallels here to the ongoing controversy around the (incorrect) belief that the MMR vaccine causes autism.