BMR 617: Statistical Techniques for the Biomedical Sciences

Hypothesis Testing to compare proportions

Remember our general Hypothesis Testing framework:

We form a hypothesis we want to test. We state this as a competition between two hypotheses:
- The null hypothesis is the "default"; it's what we would believe without any evidence to the contrary.
- The alternative hypothesis is the opposite of the null hypothesis. It's (almost always) the hypothesis for which we want to provide strong evidence.
We collect data from an experiment or study designed to test the hypothesis.
We compute a test statistic from our data. The test statistic is designed to have a specific value and known distribution if the null hypothesis is true.
We calculate the probability of obtaining data at least as extreme as the data we actually obtained, were the null hypothesis to be true. This is called the p-value.
If the p-value is "small", i.e. if it would be very unlikely we would obtain the data we actually obtained if the null hypothesis were true, we conclude the null hypothesis is false and the alternative hypothesis is true.

Comparing Proportions Example

Remember our vaccine trial data:

		Treatment
		Placebo	Vaccine	Total
SARS-CoV-2 status	Infected	162	8	170
	Not Infected	18163	18190	36353
	Total	18325	18198	36523

The question we want to ask here is

Is the proportion of those who become infected different in the vaccine group to the placebo group.

We want to formulate this as a hypothesis test.

Classifying the Variables

There are two variables in this study: Treatment and Infected. Let's recall what type and role these variables have:

What is the type and role of the "Treatment" variable?

Categorical Explanatory Variable

Quantitative Explanatory Variable

Categorical Response Variable

Quantitative Response Variable

Incorrect

Remember a quantitative variable is one whose values are numeric, representing some kind of measurement. The values of "Treatment" are "Placebo" or "Vaccine".

Incorrect

A response variable represents the outcome of an experiment or study. The outcome here is whether or not the patient becomes infected, i.e. it's the "Infected" variable.

Correct

What is the type and role of the "Infected" variable?

Categorical Explanatory Variable

Quantitative Explanatory Variable

Categorical Response Variable

Quantitative Response Variable

Incorrect

Remember a quantitative variable is one whose values are numeric, representing some kind of measurement. The values of "Treatment" are "Placebo" or "Vaccine".

Incorrect

A response variable represents the outcome of an experiment or study. An explanatory variable is one whose value is hypothesized might affect the value of the response variable.

Correct

Using C to denote categorical and Q to denote quantitative, which summarizes the experimental design here?

C → C

C → Q

Q → C

Q → Q

Incorrect

Both the explanatory and response variables in this case are categorical.

Correct

Formulating the question as null and alternative hypotheses.

The null hypothesis is the hypothesis we would tend to assume with no other data.

The null hypothesis here is:

The proportion of those who become infected is the same in the vaccine group and the placebo group.

The alternative hypothesis is:

The proportion of those who become infected is different in the vaccine group to that in the placebo group.

We know, of course, that the proportion of our sample in the vaccine group who become infected is smaller than it is in our sample for those who received the placebo. However, we want to make inferences from these data for the whole population. It is possible that the chances of infection are the same for those vaccinated as they are for those who received the placebo, and that we just happened to give the placebo to participants who were going to get infected. We ask the question: if the chances of getting infected were the same for the placebo and vaccine groups, what is the probability we would see differences in our sample at least as big as the differences we actually observed?

There are two distinct approaches to computing this p-value:

Fisher's exact test
The χ² ("Chi-squared") test.

Fisher's Exact Test.

The trial has 18325 and 18198 participants in the placebo and vaccine groups, respectively.

170 participants were infected and 36353 were not infected.

Fisher's exact test, at least conceptually, examines all possible contingency tables with these row and column totals, and calculates the probability of each one.

It then sums the probabilities of the tables in which the differences in proportion between the groups are at least as big as the one observed.

The χ² Test

The Chi-squared test works by calculating the "expected" value of each cell in the table if the proportions were equal in each group.

Overall, 170 out of 36523, or 0.004655 of the participants became infected.

There were 18325 participants in the placebo group, so assuming the null hypothesis, we'd expect, on average, 0.004655 x 18325 = 85.3 participants in the placebo group to become infected, and 18239.7 not to become infected.

Similarly, there were 18198 participants in the vaccine group, so we'd expect 0.004655 x 18198 = 84.7 to become infected, and 18113.3 not to become infected.

The χ² Test: Observed and Expected Data

Observed Data
162	8
18163	18190

Expected Data
85.3	84.7
18239.7	18113.3

How the χ² test works

In the Chi-squared test, for each cell in the table we calculate \(\frac{\left(O-E\right)^2}{E}\), where \(O\) is the observed value and \(E\) is the expected value

This is summed over all cells.

This statistic, the χ² statistic, is approximately distributed according to a distribution called the χ²-distribution with \(n\) degrees of freedom

The number of degrees of freedom is \((r-1)(c-1)\) where \(r\) and \(c\) are the number of rows and columns in the table; here there is 1 degree of freedom.

Pros and cons of the χ² test and Fisher's Exact Test

Fisher's exact test calculates the exact probability of getting results at least as extreme as those observed in the data, assuming the null hypothesis is true.

However, it is computationally intensive.

For very large samples sizes, especially if there are more than two rows or columns, this can be prohibitive. Before the advent of computers, the calculation was frequently infeasible to perform.

The χ² test is an approximate test.

The approximation starts to fail if the expected value in any cell is below 5, or below 10 for 2x2 tables.

However it is not computationally intensive.

Use Fisher's exact test if the computer can handle it; use the χ² test otherwise

Fisher's Exact Test in R

The fisher.test function will run Fisher's exact test. It requires a contingency table or matrix.

Let's build the data as though we're working with raw data, and then make a contingency table from it:


library(tidyverse)
vactrialFrame <- tibble(
  Treatment = c(rep("Placebo", 18325), rep("Vaccine", 18198)),
  Infection = c(rep("Yes", 162), rep("No", 18163), rep("Yes", 8), rep("No", 18190))
)


vactrialContingency <- table(vactrialFrame)
vactrialContingency

Now we can run Fisher's Exact test:


fisher.test(vactrialContingency)

Output:


	Fisher's Exact Test for Count Data

data:  vactrialContingency
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.02092083 0.09956554
sample estimates:
odds ratio 
0.04930743

Odds and the odds ratio

Note the Fisher's exact test generates an odds ratio.

The odds are the number infected divided by the number not infected
Not divided by the total!

For "rare" events, such as infection in this case, the odds are close to the risk.

The odds for the placebo group are \(\frac{162}{18163}\), and the odds for the vaccine group are \(\frac{8}{18190}\), so the odds ratio is \(\frac{\frac{8}{18190}}{\frac{162}{18163}} = 0.0493\), as reported.

The 95% confidence interval for the odds ratio is [0.0209, 0.0996].

We are 95% confident the interval [0.0209, 0.0996] contains the true odds ratio.

Note that if the proportions in both groups are the same, the odds are the same, and so the odds ratio would be 1. I.e., the null hypothesis is that the odds ratio is 1.

The p-value

The p-value for this test is reported as \(p < 2.2\times 10^{-16}\)

The value \(2.2\times 10^{-16}\) is essentially the smallest value the computer can represent easily.
So this is really saying the computer can't distinguish between the p-value and zero.
Remember the p-value is always associated with a null hypothesis. The null hypothesis for the Fisher's exact test is that the odds ratio is 1.
If the chances of infection were the same for the vaccine group and the placebo group, the probability we would see data with this much difference between the groups in the clinical trial is less than \(2.2\times 10^{-16}\), i.e. it is essentially zero.

χ² Test in R

To run the χ² test in R, we can use the chisq.test() function:


chisq.test(vactrialContingency)

Output:


	Pearson's χ² test with Yates' continuity correction

data:  vactrialContingency
X-squared = 137.28, df = 1, p-value < 2.2e-16

The χ² test also gave us a p-value of \(2.2 \times 10^{-16}\).

Remember, this is interpreted in the context of a null hypothesis.

The null hypothesis for the χ² test is that the response variable (infection) is independent of the explanatory variable (treatment).

There is no estimate of an odds ratio

Effect Size

Fisher's exact test gave an estimate of the odds ratio, along with a 95% confidence interval.

This is useful information.
The p-value merely tells us how likely these data would be if there were no difference between the placebo and vaccinated groups.
However, we would also to know how different they are.
The odds ratio tells us this.
It is a measure of effect size: how much effect does the treatment have?

Effect Size When you Cannot Use Fisher's Exact Test

In the (rare) cases when you cannot use Fisher's Exact Test, the χ² test gives a p-value but no effect size. In this case we can also use prop.test() with two proportions.

prop.test() needs the number infected and the total:


	prop.test(x=c(162, 8), n=c(18325, 18198))

This gives:


	2-sample test for equality of proportions with continuity correction

data:  c(162, 8) out of c(18325, 18198)
X-squared = 137.28, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
 0.006956920 0.009844627
sample estimates:
      prop 1       prop 2 
0.0088403820 0.0004396087

Note the output from prop.test() also gives a χ² statistic, and a p-value. These have exactly the same interpretation as the chi-squared test.

It also gives the proportion in each group, as sample estimates. This is the number infected divided by the total in that group.

I.e. the risk

The difference in the risk is the attributable risk: \[0.00884 - 0.00044 = 0.00840\] The confidence interval is the 95% confidence interval for the attributable risk: we are 95% confident the range [0.00696, 0.00984] contains the true attributable risk.

Note on odds ratio and FDA requirements for Emergency Use

Based on this trial, the Federal Drug Administration (FDA) granted Emergency Use Authority (EUA) for the vaccine.

The FDA requires (among other criteria) that the drug be at least 50% effective in order to grant EUA. To be 50% effective, the odds in the vaccinated group must be 50% or less of the odds in the placebo group.

This means the odds ratio (vaccinated/placebo) must be less than 0.5.

The estimated odds ratio from this sample, from the Fisher's Exact Test data, was just under 0.05. So the vaccine is estimated to be 95% effective. Furthermore, the 95% confidence interval for this was [0.021, 0.100]. So we are 95% confident that the vaccine is at least 90% effective. At the time, this was the most effective vaccine ever trialled.

Summary

To compare two proportions, favor using Fisher's Exact Test (fisher.test(…) in R).

If fisher.test(…) cannot be run, use chisq.test(…). In this case you can also use prop.test(…) with two proportions, which generates more information.

Be careful! fisher.test(…) and chisq.test(…) expect contingency tables; prop.test(…) expects the value(s) and total(s).

BMR 617: Statistical Techniques for the Biomedical Sciences

Hypothesis Testing to compare proportions

Comparing Proportions Example

Classifying the Variables

Formulating the question as null and alternative hypotheses.

Fisher's Exact Test.

The χ2 Test

The χ2 Test: Observed and Expected Data

How the χ2 test works

Pros and cons of the χ2 test and Fisher's Exact Test

Fisher's Exact Test in R

Odds and the odds ratio

The p-value

χ2 Test in R

Effect Size

Effect Size When you Cannot Use Fisher's Exact Test

Note on odds ratio and FDA requirements for Emergency Use

Summary

The χ² Test

The χ² Test: Observed and Expected Data

How the χ² test works

Pros and cons of the χ² test and Fisher's Exact Test

χ² Test in R