BMR 617: Statistical Techniques for the Biomedical Sciences

Hypothesis Testing to compare proportions

Remember our general Hypothesis Testing framework:

Comparing Proportions Example

Remember our vaccine trial data:

 Treatment
 Placebo  Vaccine  Total
 SARS-CoV-2 status  Infected  162  8  170
 Not Infected  18163  18190  36353
 Total  18325  18198  36523
The question we want to ask here is
Is the proportion of those who become infected different in the vaccine group to the placebo group.
We want to formulate this as a hypothesis test.

Classifying the Variables

There are two variables in this study: Treatment and Infected. Let's recall what type and role these variables have:

What is the type and role of the "Treatment" variable?
Categorical Explanatory Variable
Quantitative Explanatory Variable
Categorical Response Variable
Quantitative Response Variable

What is the type and role of the "Infected" variable?
Categorical Explanatory Variable
Quantitative Explanatory Variable
Categorical Response Variable
Quantitative Response Variable

Using C to denote categorical and Q to denote quantitative, which summarizes the experimental design here?
C → C
C → Q
Q → C
Q → Q

Formulating the question as null and alternative hypotheses.

The null hypothesis is the hypothesis we would tend to assume with no other data.

The null hypothesis here is:

The proportion of those who become infected is the same in the vaccine group and the placebo group.

The alternative hypothesis is:

The proportion of those who become infected is different in the vaccine group to that in the placebo group.
We know, of course, that the proportion of our sample in the vaccine group who become infected is smaller than it is in our sample for those who received the placebo. However, we want to make inferences from these data for the whole population. It is possible that the chances of infection are the same for those vaccinated as they are for those who received the placebo, and that we just happened to give the placebo to participants who were going to get infected. We ask the question: if the chances of getting infected were the same for the placebo and vaccine groups, what is the probability we would see differences in our sample at least as big as the differences we actually observed?

There are two distinct approaches to computing this p-value:

Fisher's Exact Test.

The trial has 18325 and 18198 participants in the placebo and vaccine groups, respectively.

170 participants were infected and 36353 were not infected.

Fisher's exact test, at least conceptually, examines all possible contingency tables with these row and column totals, and calculates the probability of each one.

It then sums the probabilities of the tables in which the differences in proportion between the groups are at least as big as the one observed.

The χ2 Test

The Chi-squared test works by calculating the "expected" value of each cell in the table if the proportions were equal in each group.

Overall, 170 out of 36523, or 0.004655 of the participants became infected.

There were 18325 participants in the placebo group, so assuming the null hypothesis, we'd expect, on average, 0.004655 x 18325 = 85.3 participants in the placebo group to become infected, and 18239.7 not to become infected.

Similarly, there were 18198 participants in the vaccine group, so we'd expect 0.004655 x 18198 = 84.7 to become infected, and 18113.3 not to become infected.

The χ2 Test: Observed and Expected Data

Observed Data
162 8
18163 18190
Expected Data
85.3 84.7
18239.7 18113.3

How the χ2 test works

In the Chi-squared test, for each cell in the table we calculate \(\frac{\left(O-E\right)^2}{E}\), where \(O\) is the observed value and \(E\) is the expected value

This is summed over all cells.

This statistic, the χ2 statistic, is approximately distributed according to a distribution called the χ2-distribution with \(n\) degrees of freedom

The number of degrees of freedom is \((r-1)(c-1)\) where \(r\) and \(c\) are the number of rows and columns in the table; here there is 1 degree of freedom.

Pros and cons of the χ2 test and Fisher's Exact Test

Fisher's exact test calculates the exact probability of getting results at least as extreme as those observed in the data, assuming the null hypothesis is true.

However, it is computationally intensive.

For very large samples sizes, especially if there are more than two rows or columns, this can be prohibitive. Before the advent of computers, the calculation was frequently infeasible to perform.

The χ2 test is an approximate test.

The approximation starts to fail if the expected value in any cell is below 5, or below 10 for 2x2 tables.

However it is not computationally intensive.

Use Fisher's exact test if the computer can handle it; use the χ2 test otherwise

Fisher's Exact Test in R

The fisher.test function will run Fisher's exact test. It requires a contingency table or matrix.

Let's build the data as though we're working with raw data, and then make a contingency table from it:


library(tidyverse)
vactrialFrame <- tibble(
  Treatment = c(rep("Placebo", 18325), rep("Vaccine", 18198)),
  Infection = c(rep("Yes", 162), rep("No", 18163), rep("Yes", 8), rep("No", 18190))
)


vactrialContingency <- table(vactrialFrame)
vactrialContingency
	
Now we can run Fisher's Exact test:

fisher.test(vactrialContingency)
	

Output:


	Fisher's Exact Test for Count Data

data:  vactrialContingency
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.02092083 0.09956554
sample estimates:
odds ratio 
0.04930743 
	

Odds and the odds ratio

Note the Fisher's exact test generates an odds ratio.

For "rare" events, such as infection in this case, the odds are close to the risk.

The odds for the placebo group are \(\frac{162}{18163}\), and the odds for the vaccine group are \(\frac{8}{18190}\), so the odds ratio is \(\frac{\frac{8}{18190}}{\frac{162}{18163}} = 0.0493\), as reported.

The 95% confidence interval for the odds ratio is [0.0209, 0.0996].

We are 95% confident the interval [0.0209, 0.0996] contains the true odds ratio.

Note that if the proportions in both groups are the same, the odds are the same, and so the odds ratio would be 1. I.e., the null hypothesis is that the odds ratio is 1.

The p-value

The p-value for this test is reported as \(p < 2.2\times 10^{-16}\)

χ2 Test in R

To run the χ2 test in R, we can use the chisq.test() function:


chisq.test(vactrialContingency)
	

Output:


	Pearson's χ2 test with Yates' continuity correction

data:  vactrialContingency
X-squared = 137.28, df = 1, p-value < 2.2e-16
	

The χ2 test also gave us a p-value of \(2.2 \times 10^{-16}\).

Remember, this is interpreted in the context of a null hypothesis.

The null hypothesis for the χ2 test is that the response variable (infection) is independent of the explanatory variable (treatment).

There is no estimate of an odds ratio

Effect Size

Fisher's exact test gave an estimate of the odds ratio, along with a 95% confidence interval.

Effect Size When you Cannot Use Fisher's Exact Test

In the (rare) cases when you cannot use Fisher's Exact Test, the χ2 test gives a p-value but no effect size. In this case we can also use prop.test() with two proportions.

prop.test() needs the number infected and the total:


	prop.test(x=c(162, 8), n=c(18325, 18198))
	
This gives:

	2-sample test for equality of proportions with continuity correction

data:  c(162, 8) out of c(18325, 18198)
X-squared = 137.28, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.006956920 0.009844627
sample estimates:
      prop 1       prop 2 
0.0088403820 0.0004396087 
	
Note the output from prop.test() also gives a χ2 statistic, and a p-value. These have exactly the same interpretation as the chi-squared test.

It also gives the proportion in each group, as sample estimates. This is the number infected divided by the total in that group.

The difference in the risk is the attributable risk: \[0.00884 - 0.00044 = 0.00840\] The confidence interval is the 95% confidence interval for the attributable risk: we are 95% confident the range [0.00696, 0.00984] contains the true attributable risk.

Note on odds ratio and FDA requirements for Emergency Use

Based on this trial, the Federal Drug Administration (FDA) granted Emergency Use Authority (EUA) for the vaccine.

The FDA requires (among other criteria) that the drug be at least 50% effective in order to grant EUA. To be 50% effective, the odds in the vaccinated group must be 50% or less of the odds in the placebo group.

This means the odds ratio (vaccinated/placebo) must be less than 0.5.

The estimated odds ratio from this sample, from the Fisher's Exact Test data, was just under 0.05. So the vaccine is estimated to be 95% effective. Furthermore, the 95% confidence interval for this was [0.021, 0.100]. So we are 95% confident that the vaccine is at least 90% effective. At the time, this was the most effective vaccine ever trialled.

Summary

To compare two proportions, favor using Fisher's Exact Test (fisher.test(…) in R).

If fisher.test(…) cannot be run, use chisq.test(…). In this case you can also use prop.test(…) with two proportions, which generates more information.

Be careful! fisher.test(…) and chisq.test(…) expect contingency tables; prop.test(…) expects the value(s) and total(s).