To introduce the ideas of sample size and statistical power, we'll start with a recap
of the hypothesis testing framework we use. In statistical hypothesis testing, we
form two competing hypotheses.
The Null and Alternative Hypotheses
The null hypothesis, H0 is what we would conservatively believe without
any evidence to the contrary.
The alternative hypothesis, Ha is the opposite of the null
hypothesis. It is typically what we are trying to "prove" (or provide evidence to support).
The Hypothesis Testing Process
Define a p-value threshold, α
Collect data to test the hypothesis
Compute the p-value: the probability we'd observe data at least as extreme as the
data we actually observed, if the null hypothesis were true
If the p-value is less than α, we reject the null hypothesis and conclude that
the we have strong evidence to support the alternative hypothesis.
This is called a positive outcome.
If the p-value is greater than α, we do not reject the null hypothesis, and
draw no conclusion. This is called a negative outcome.
Because the data we collect are subject to random effects (i.e. effects our experiment
cannot or does not control for), there is always a possibility this process
will lead to the incorrect conclusion.
Type I and Type II Errors
The two types of error, Type I and Type II errors, both refer to the situation where
the conclusion we draw from a hypothesis test differs from what is really true. It's
important to note here that we essentially never know what is really true; all we know
in practice is the outcome of our experiment and the conclusion we drew from it.
A Type I Error occurs when we reject the null hypothesis (i.e. we get a p-value
less than α) when in reality the null hypothesis is true. This is also known
as a false positive
A Type II Error occurs when we fail to reject the null hypothesis (i.e. we get a
p-value greater than α) when in reality the null hypothesis is false. This is also
know as a false negative.
These can be summarized in the following table:
Experimental Result
Fail to Reject null hypothesis (Negative)
Reject null hypothesis (Positive)
Reality
Null hypothesis is true (Negative)
True Negative
False positive
Null hypothesis is false (Positive)
False negative
True Positive
So far, we have focussed on the false positive rate, which is the chances
of obtaining a false positive when the null hypothesis is true. The false
positive rate is exactly α, since this is the probability we will get a p-value
less than α when the null hypothesis is true.
But what about false negatives? We denote by β the probability that we get a
false negative when the null hypothesis is false. The statistical
power of our experiment/study is the probability we get a true positive when
the null hypothesis is false; i.e. it's the probability that if our alternative
hypothesis is true in reality, our experiment is "powerful" enough to detect that
and to produce a positive result. Since the statistical power is the opposite
of a false negative, under the same assumption, the statistical power is 1-β.
What is a Type I Error?
Coming to the wrong conclusion because of poor experimental design
Coming to the wrong conclusion because of technical errors in the experiment
Getting a p-value below α and concluding the hypothesis of interest is true,
when in reality the hypothesis of interest is false.
Getting a p-value above α and failing to conclude the hypothesis of interest
is true, when in reality it is true.
Incorrect
Type I and Type II errors occur because the data we collect is subject
to random effects; there is always the possibility of either type of
error, no matter how well the experiment or study is designed and
performed.
Incorrect
Type I errors are also known as "false positives": Type II errors
are "false negatives"
Correct!
Type I errors are also known as "false positives": they occur
when we reject the null hypothesis and conclude the alternative
hypothesis (our hypothesis of interest) is correct; but when the
reality is that our hypothesis of interest is incorrect.
What is a Type II Error?
Coming to the wrong conclusion because of poor experimental design
Coming to the wrong conclusion because of technical errors in the experiment
Getting a p-value below α and concluding the hypothesis of interest is true,
when in reality the hypothesis of interest is false.
Getting a p-value above α and failing to conclude the hypothesis of interest
is true, when in reality it is true.
Incorrect
Type I and Type II errors occur because the data we collect is subject
to random effects; there is always the possibility of either type of
error, no matter how well the experiment or study is designed and
performed.
Incorrect
Type I errors are also known as "false positives": Type II errors
are "false negatives"
Correct!
Type II errors are also known as "false negatives": they occur
when we fail to reject the null hypothesis and fail to
conclude the alternative
hypothesis (our hypothesis of interest) is correct; but when the
reality is that our hypothesis of interest is actually correct in reality.
Summary
Type I and Type II errors are consequences of random effects in the data we collect.
The possibility of Type I and Type II errors cannot be eliminated.
The probability of a Type I error, if the null hypothesis is true, is denoted α
The probability of a Type II error, if the null hypothesis is false, is denoted β
The probability of a true positive, if the null hypothesis is false, is 1-β, and is called the
statistical power
So far we've discussed α a lot; in the next section, we'll talk about
controlling the statistical power via choosing an appropriate sample size for our experiments.