In the section on ANOVA, we discussed the idea that performing multiple pairwise tests (e.g. T-tests) between pairs of data groups would increase our chances of a false positive. In that context, we used specific Post-Hoc Tests to analyze the data in a way that still controlled the chances of a false postive. In this section we'll look at the idea that performing multiple hypothesis tests increases the chances of a false positive in a more general context, and look at ways to remedy the problem.
To understand this section, it's important to make sure we understand the definition and meaning of a p-value. If you need, review the section on hypothesis testing. Then answer the following questions on p-values.
A misleading public media article incorrectly indicates that people born on Tuesdays have a higher chance of suffering from cancer at some point in their lives, and gains widespread publicity. As a result, many researchers conduct studies which look at historical medical records, extracting the birth date of patients and whether or not they suffered from cancer at some point. Each of these many studies are conducted independently (i.e. with a different set of patient records) and compute a p-value for the null hypothesis that people born on Tuesdays are equally as likely to suffer from cancer as people born on other days of the week.
Considering all these p-values together, and making the (reasonable) assumption that the null hypothesis here is true - i.e. that there is no relationship between being born on a Tuesday and suffering from cancer - what would these p-values look like?