Remember our general Hypothesis Testing framework:
Remember our vaccine trial data:
| Treatment | ||||
|---|---|---|---|---|
| Placebo | Vaccine | Total | ||
| SARS-CoV-2 status | Infected | 162 | 8 | 170 |
| Not Infected | 18163 | 18190 | 36353 | |
| Total | 18325 | 18198 | 36523 | |
Is the proportion of those who become infected different in the vaccine group to the placebo group.We want to formulate this as a hypothesis test.
There are two variables in this study: Treatment and Infected. Let's recall what type and role these variables have:
The null hypothesis is the hypothesis we would tend to assume with no other data.
The null hypothesis here is:
The proportion of those who become infected is the same in the vaccine group and the placebo group.
The alternative hypothesis is:
The proportion of those who become infected is different in the vaccine group to that in the placebo group.We know, of course, that the proportion of our sample in the vaccine group who become infected is smaller than it is in our sample for those who received the placebo. However, we want to make inferences from these data for the whole population. It is possible that the chances of infection are the same for those vaccinated as they are for those who received the placebo, and that we just happened to give the placebo to participants who were going to get infected. We ask the question: if the chances of getting infected were the same for the placebo and vaccine groups, what is the probability we would see differences in our sample at least as big as the differences we actually observed?
There are two distinct approaches to computing this p-value:
The trial has 18325 and 18198 participants in the placebo and vaccine groups, respectively.
170 participants were infected and 36353 were not infected.
Fisher's exact test, at least conceptually, examines all possible contingency tables with these row and column totals, and calculates the probability of each one.
It then sums the probabilities of the tables in which the differences in proportion between the groups are at least as big as the one observed.
The Chi-squared test works by calculating the "expected" value of each cell in the table if the proportions were equal in each group.
Overall, 170 out of 36523, or 0.004655 of the participants became infected.
There were 18325 participants in the placebo group, so assuming the null hypothesis, we'd expect, on average, 0.004655 x 18325 = 85.3 participants in the placebo group to become infected, and 18239.7 not to become infected.
Similarly, there were 18198 participants in the vaccine group, so we'd expect 0.004655 x 18198 = 84.7 to become infected, and 18113.3 not to become infected.
| Observed Data | |
|---|---|
| 162 | 8 |
| 18163 | 18190 |
| Expected Data | |
|---|---|
| 85.3 | 84.7 |
| 18239.7 | 18113.3 |
In the Chi-squared test, for each cell in the table we calculate \(\frac{\left(O-E\right)^2}{E}\), where \(O\) is the observed value and \(E\) is the expected value
This is summed over all cells.
This statistic, the χ2 statistic, is approximately distributed according to a distribution called the χ2-distribution with \(n\) degrees of freedom
The number of degrees of freedom is \((r-1)(c-1)\) where \(r\) and \(c\) are the number of rows and columns in the table; here there is 1 degree of freedom.
Fisher's exact test calculates the exact probability of getting results at least as extreme as those observed in the data, assuming the null hypothesis is true.
However, it is computationally intensive.
For very large samples sizes, especially if there are more than two rows or columns, this can be prohibitive. Before the advent of computers, the calculation was frequently infeasible to perform.
The χ2 test is an approximate test.
The approximation starts to fail if the expected value in any cell is below 5, or below 10 for 2x2 tables.
However it is not computationally intensive.
The fisher.test function will run Fisher's exact test.
It requires a contingency table or matrix.
Let's build the data as though we're working with raw data, and then make a contingency table from it:
library(tidyverse)
vactrialFrame <- tibble(
Treatment = c(rep("Placebo", 18325), rep("Vaccine", 18198)),
Infection = c(rep("Yes", 162), rep("No", 18163), rep("Yes", 8), rep("No", 18190))
)
vactrialContingency <- table(vactrialFrame)
vactrialContingency
Now we can run Fisher's Exact test:
fisher.test(vactrialContingency)
Output:
Fisher's Exact Test for Count Data
data: vactrialContingency
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.02092083 0.09956554
sample estimates:
odds ratio
0.04930743
Note the Fisher's exact test generates an odds ratio.
The odds for the placebo group are \(\frac{162}{18163}\), and the odds for the vaccine group are \(\frac{8}{18190}\), so the odds ratio is \(\frac{\frac{8}{18190}}{\frac{162}{18163}} = 0.0493\), as reported.
The 95% confidence interval for the odds ratio is [0.0209, 0.0996].
We are 95% confident the interval [0.0209, 0.0996] contains the true odds ratio.
Note that if the proportions in both groups are the same, the odds are the same, and so the odds ratio would be 1. I.e., the null hypothesis is that the odds ratio is 1.
The p-value for this test is reported as \(p < 2.2\times 10^{-16}\)
To run the χ2 test in R, we can use the
chisq.test() function:
chisq.test(vactrialContingency)
Output:
Pearson's χ2 test with Yates' continuity correction
data: vactrialContingency
X-squared = 137.28, df = 1, p-value < 2.2e-16
The χ2 test also gave us a p-value of \(2.2 \times 10^{-16}\).
Remember, this is interpreted in the context of a null hypothesis.
The null hypothesis for the χ2 test is that the response variable (infection) is independent of the explanatory variable (treatment).
There is no estimate of an odds ratio
Fisher's exact test gave an estimate of the odds ratio, along with a 95% confidence interval.
In the (rare) cases when you cannot use Fisher's Exact Test, the
χ2 test gives a p-value but no effect size.
In this case we can also use prop.test() with two proportions.
prop.test() needs the number infected and the total:
prop.test(x=c(162, 8), n=c(18325, 18198))
This gives:
2-sample test for equality of proportions with continuity correction
data: c(162, 8) out of c(18325, 18198)
X-squared = 137.28, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.006956920 0.009844627
sample estimates:
prop 1 prop 2
0.0088403820 0.0004396087
Note the output from prop.test() also gives a χ2 statistic, and a p-value.
These have exactly the same interpretation as the chi-squared test.
It also gives the proportion in each group, as sample estimates. This is the number infected divided by the total in that group.
Based on this trial, the Federal Drug Administration (FDA) granted Emergency Use Authority (EUA) for the vaccine.
The FDA requires (among other criteria) that the drug be at least 50% effective in order to grant EUA. To be 50% effective, the odds in the vaccinated group must be 50% or less of the odds in the placebo group.
This means the odds ratio (vaccinated/placebo) must be less than 0.5.
The estimated odds ratio from this sample, from the Fisher's Exact Test data, was just under 0.05. So the vaccine is estimated to be 95% effective. Furthermore, the 95% confidence interval for this was [0.021, 0.100]. So we are 95% confident that the vaccine is at least 90% effective. At the time, this was the most effective vaccine ever trialled.
To compare two proportions, favor using Fisher's Exact Test
(fisher.test(…) in R).
If fisher.test(…) cannot be run, use chisq.test(…).
In this case you can also use prop.test(…) with two proportions,
which generates more information.
Be careful! fisher.test(…) and chisq.test(…)
expect contingency tables; prop.test(…) expects the value(s) and
total(s).