BMR 617 - Sample Size and Power: Introduction

Sample Size and Power : Introduction

Recap of Hypothesis Testing

To introduce the ideas of sample size and statistical power, we'll start with a recap of the hypothesis testing framework we use. In statistical hypothesis testing, we form two competing hypotheses.

The Null and Alternative Hypotheses

The null hypothesis, H₀ is what we would conservatively believe without any evidence to the contrary.
The alternative hypothesis, H_a is the opposite of the null hypothesis. It is typically what we are trying to "prove" (or provide evidence to support).

The Hypothesis Testing Process

Define a p-value threshold, α
Collect data to test the hypothesis
Compute the p-value: the probability we'd observe data at least as extreme as the data we actually observed, if the null hypothesis were true
If the p-value is less than α, we reject the null hypothesis and conclude that the we have strong evidence to support the alternative hypothesis. This is called a positive outcome.
If the p-value is greater than α, we do not reject the null hypothesis, and draw no conclusion. This is called a negative outcome.

Because the data we collect are subject to random effects (i.e. effects our experiment cannot or does not control for), there is always a possibility this process will lead to the incorrect conclusion.

Type I and Type II Errors

The two types of error, Type I and Type II errors, both refer to the situation where the conclusion we draw from a hypothesis test differs from what is really true. It's important to note here that we essentially never know what is really true; all we know in practice is the outcome of our experiment and the conclusion we drew from it.

A Type I Error occurs when we reject the null hypothesis (i.e. we get a p-value less than α) when in reality the null hypothesis is true. This is also known as a false positive
A Type II Error occurs when we fail to reject the null hypothesis (i.e. we get a p-value greater than α) when in reality the null hypothesis is false. This is also know as a false negative.

These can be summarized in the following table:

		Experimental Result
		Fail to Reject null hypothesis (Negative)	Reject null hypothesis (Positive)
Reality	Null hypothesis is true (Negative)	True Negative	False positive
Reality	Null hypothesis is false (Positive)	False negative	True Positive

So far, we have focussed on the false positive rate, which is the chances of obtaining a false positive when the null hypothesis is true. The false positive rate is exactly α, since this is the probability we will get a p-value less than α when the null hypothesis is true.

But what about false negatives? We denote by β the probability that we get a false negative when the null hypothesis is false. The statistical power of our experiment/study is the probability we get a true positive when the null hypothesis is false; i.e. it's the probability that if our alternative hypothesis is true in reality, our experiment is "powerful" enough to detect that and to produce a positive result. Since the statistical power is the opposite of a false negative, under the same assumption, the statistical power is 1-β.

What is a Type I Error?
1. Coming to the wrong conclusion because of poor experimental design
2. Coming to the wrong conclusion because of technical errors in the experiment
3. Getting a p-value below α and concluding the hypothesis of interest is true, when in reality the hypothesis of interest is false.
4. Getting a p-value above α and failing to conclude the hypothesis of interest is true, when in reality it is true.
Incorrect
Type I and Type II errors occur because the data we collect is subject to random effects; there is always the possibility of either type of error, no matter how well the experiment or study is designed and performed.

Incorrect

Type I errors are also known as "false positives": Type II errors are "false negatives"

Correct!

Type I errors are also known as "false positives": they occur when we reject the null hypothesis and conclude the alternative hypothesis (our hypothesis of interest) is correct; but when the reality is that our hypothesis of interest is incorrect.
What is a Type II Error?
1. Coming to the wrong conclusion because of poor experimental design
2. Coming to the wrong conclusion because of technical errors in the experiment
3. Getting a p-value below α and concluding the hypothesis of interest is true, when in reality the hypothesis of interest is false.
4. Getting a p-value above α and failing to conclude the hypothesis of interest is true, when in reality it is true.
Incorrect
Type I and Type II errors occur because the data we collect is subject to random effects; there is always the possibility of either type of error, no matter how well the experiment or study is designed and performed.

Incorrect

Type I errors are also known as "false positives": Type II errors are "false negatives"

Correct!

Type II errors are also known as "false negatives": they occur when we fail to reject the null hypothesis and fail to conclude the alternative hypothesis (our hypothesis of interest) is correct; but when the reality is that our hypothesis of interest is actually correct in reality.

Summary

Type I and Type II errors are consequences of random effects in the data we collect.
The possibility of Type I and Type II errors cannot be eliminated.
The probability of a Type I error, if the null hypothesis is true, is denoted α
The probability of a Type II error, if the null hypothesis is false, is denoted β
The probability of a true positive, if the null hypothesis is false, is 1-β, and is called the statistical power

So far we've discussed α a lot; in the next section, we'll talk about controlling the statistical power via choosing an appropriate sample size for our experiments.