Last time we introduced Hypothesis Testing. The framework is as follows:
Let's look at a fairly simple case. We'll consider looking at a single proportion and testing if it is the same as, or different to, some fixed value.
In a study of Neonatal Abstinence Syndrome, DNA samples were taken from 34 pregnant women in West Virginia with substance use disorder and sequenced.
For the variant rs6972158 in the gene NPSR1, 27 of the 68 alleles were found to be the variant allele G (41 were the reference allele A).
According to the 1000 Genomes Project, the G allele frequency in the American population is 0.235.
We want to know if the variant allele frequency in this population of pregnant women in West Virginia with substance use disorder is different to that of the general American population
The null hypothesis is the hypothesis we would tend to assume with no other data. The alternative hypothesis is typically the hypothesis we would like to "prove" (or provide evidence to support).
The null hypothesis here is:
The G allele frequency in the study population is 0.235
The alternative hypothesis is:
The G allele frequency in the study population is not 0.235
In R, we can test the null hypothesis that a proportion is equal to some
given value using the prop.test() function.
To see how to use the function, type ?prop.test in the console, or search
for prop.test in the help tab.
prop.test
function, and which are optional?
x, n, and p are required,
alternative, conf.level, and correct
are optional.
x and n are required,
p, alternative, conf.level, and correct
are optional.
x, n, p,
alternative, conf.level, and correct
are all required.
x, n, p,
alternative, conf.level, and correct
are all optional.
p
here. If so, what should it be?
p
p of 0.5
p of 0.235
p of NULL
p." Our null hypothesis
is that the probability of a G allele is 0.235, so p
should be 0.235.
In our case, we have 27 G alleles out of a total of 68 alleles, and the probability of a G allele under the null hypothesis is 0.235:
prop.test(27,68,0.235)
Run this test. The output you should see is
1-sample proportions test with continuity correction
data: 27 out of 68, null probability 0.235
X-squared = 9.053, df = 1, p-value = 0.002623
alternative hypothesis: true p is not equal to 0.235
95 percent confidence interval:
0.2826780 0.5231249
sample estimates:
p
0.3970588
The sample estimate (i.e. the estimate of the probability of a G allele from the sample) is 0.397.
The 95% confidence interval for the proportion is [0.283, 0.523]
The p-value is 0.002623. This means that if the null hypothesis were true, there would be a 0.2623% chance of seeing data "this extreme"
Since the p-value of 0.002623 is less than our predetermined threshold of 0.05, we would reject the null hypothesis and conclude that the proportion of G alleles in this population is different to 0.235.
Since the entire 95% confidence interval is above 0.235, we'd conclude it's more than 0.235
Some possible explanations: