Sample Size and Power : Introduction

Recap of Hypothesis Testing

To introduce the ideas of sample size and statistical power, we'll start with a recap of the hypothesis testing framework we use. In statistical hypothesis testing, we form two competing hypotheses.

The Null and Alternative Hypotheses

The Hypothesis Testing Process

Because the data we collect are subject to random effects (i.e. effects our experiment cannot or does not control for), there is always a possibility this process will lead to the incorrect conclusion.

Type I and Type II Errors

The two types of error, Type I and Type II errors, both refer to the situation where the conclusion we draw from a hypothesis test differs from what is really true. It's important to note here that we essentially never know what is really true; all we know in practice is the outcome of our experiment and the conclusion we drew from it.

These can be summarized in the following table:
Experimental Result
Fail to Reject null hypothesis
(Negative)
Reject null hypothesis
(Positive)
RealityNull hypothesis is true
(Negative)
True NegativeFalse positive
Null hypothesis is false
(Positive)
False negativeTrue Positive

So far, we have focussed on the false positive rate, which is the chances of obtaining a false positive when the null hypothesis is true. The false positive rate is exactly α, since this is the probability we will get a p-value less than α when the null hypothesis is true.

But what about false negatives? We denote by β the probability that we get a false negative when the null hypothesis is false. The statistical power of our experiment/study is the probability we get a true positive when the null hypothesis is false; i.e. it's the probability that if our alternative hypothesis is true in reality, our experiment is "powerful" enough to detect that and to produce a positive result. Since the statistical power is the opposite of a false negative, under the same assumption, the statistical power is 1-β.

  1. What is a Type I Error?
    1. Coming to the wrong conclusion because of poor experimental design
    2. Coming to the wrong conclusion because of technical errors in the experiment
    3. Getting a p-value below α and concluding the hypothesis of interest is true, when in reality the hypothesis of interest is false.
    4. Getting a p-value above α and failing to conclude the hypothesis of interest is true, when in reality it is true.
  2. What is a Type II Error?
    1. Coming to the wrong conclusion because of poor experimental design
    2. Coming to the wrong conclusion because of technical errors in the experiment
    3. Getting a p-value below α and concluding the hypothesis of interest is true, when in reality the hypothesis of interest is false.
    4. Getting a p-value above α and failing to conclude the hypothesis of interest is true, when in reality it is true.

Summary

  • Type I and Type II errors are consequences of random effects in the data we collect.
  • The possibility of Type I and Type II errors cannot be eliminated.
  • The probability of a Type I error, if the null hypothesis is true, is denoted α
  • The probability of a Type II error, if the null hypothesis is false, is denoted β
  • The probability of a true positive, if the null hypothesis is false, is 1-β, and is called the statistical power

So far we've discussed α a lot; in the next section, we'll talk about controlling the statistical power via choosing an appropriate sample size for our experiments.