BMR 617: Statistical Techniques for the Biomedical Sciences

Hypothesis Testing for a Single Proportion

Last time we introduced Hypothesis Testing. The framework is as follows:

Let's look at a fairly simple case. We'll consider looking at a single proportion and testing if it is the same as, or different to, some fixed value.

Single Proportion Example

In a study of Neonatal Abstinence Syndrome, DNA samples were taken from 34 pregnant women in West Virginia with substance use disorder and sequenced.

For the variant rs6972158 in the gene NPSR1, 27 of the 68 alleles were found to be the variant allele G (41 were the reference allele A).

According to the 1000 Genomes Project, the G allele frequency in the American population is 0.235.

We want to know if the variant allele frequency in this population of pregnant women in West Virginia with substance use disorder is different to that of the general American population

Formulating the question as null and alternative hypotheses.

The null hypothesis is the hypothesis we would tend to assume with no other data. The alternative hypothesis is typically the hypothesis we would like to "prove" (or provide evidence to support).

The null hypothesis here is:

The G allele frequency in the study population is 0.235

The alternative hypothesis is:

The G allele frequency in the study population is not 0.235

R code for testing a value of a single proportion

In R, we can test the null hypothesis that a proportion is equal to some given value using the prop.test() function.

To see how to use the function, type ?prop.test in the console, or search for prop.test in the help tab.

Which parameters are required for the prop.test function, and which are optional?
x, n, and p are required, alternative, conf.level, and correct are optional.
x and n are required, p, alternative, conf.level, and correct are optional.
x, n, p, alternative, conf.level, and correct are all required.
x, n, p, alternative, conf.level, and correct are all optional.

Read the "Details" section of the help, in particular the third paragraph (in our example, we only have one group). Should we provide a value for p here. If so, what should it be?
We should not provide a value for p
We should provide a value for p of 0.5
We should provide a value for p of 0.235
We should provide a value for p of NULL

In our case, we have 27 G alleles out of a total of 68 alleles, and the probability of a G allele under the null hypothesis is 0.235:


prop.test(27,68,0.235)
	

Run this test. The output you should see is


	1-sample proportions test with continuity correction

data:  27 out of 68, null probability 0.235
X-squared = 9.053, df = 1, p-value = 0.002623
alternative hypothesis: true p is not equal to 0.235
95 percent confidence interval:
 0.2826780 0.5231249
sample estimates:
        p 
0.3970588
	

Interpreting the output

The sample estimate (i.e. the estimate of the probability of a G allele from the sample) is 0.397.

The 95% confidence interval for the proportion is [0.283, 0.523]

The p-value is 0.002623. This means that if the null hypothesis were true, there would be a 0.2623% chance of seeing data "this extreme"

Interpreting the result

Since the p-value of 0.002623 is less than our predetermined threshold of 0.05, we would reject the null hypothesis and conclude that the proportion of G alleles in this population is different to 0.235.

Since the entire 95% confidence interval is above 0.235, we'd conclude it's more than 0.235

Some possible explanations: