library(tidyverse)

Sample Size and Power

The principle investigator in your lab is interested in comparing the cholesterol level of a strain of mice (not the TALLYHO strain) to the B6 strain, with both strains being fed the Chow diet.

We previously performed a similar experiment with TALLYHO mice, comparing their cholesterol level to B6 mice. We will use the data from that experiment to estmiate the cholesterol level we expect in B6 and the standard deviation we expect in our new experiment. In that previous experiment, the mean cholesterol level of B6 mice was 42.95mg/dl, with a standard deviation of 12.0mg/dl.

The effect size should be the smallest change in cholesterol level that we would find to be biologically interesting. This is a subjective decision (we'll see how to deal with that shortly). For our purposes, we consider a change of about 50% to be interesting, so (using round numbers) we'll use a delta of 20mg/dl.

Computing a sample size

To find the number of mice needed to give us a statistical power of 0.8 (i.e. an 80% chance of getting a significantly significant result) if the difference in mean cholesterol level is at least 20mg/dl, assuming a pooled standard deviation of 12mg/dl, we can use

power.t.test(n=NULL, power=0.8, sd=12, delta=20)
## 
##      Two-sample t test power calculation 
## 
##               n = 6.76095
##           delta = 20
##              sd = 12
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

This results in n=6.76. This means a sample size of n=6 per group would give less than 80% power, and a sample size of n=7 per group would give more than 80% power. Therefore, we'll choose a sample size of 7 mice per group for our experiment.

Computing the power for this sample size under other assumptions

It's always a good idea to see the consequences of our choice of sample size if we use different effect sizes and/or different assumptions about the standard deviation.

We'll compute the statistical power we get using 7 mice per group for a range of effect sizes from 12mg/dl to 28mg/dl (going in steps of 4mg/dl), and for a range of standard deviations from 8mg/dl to 16mg/dl (in steps of 2mg/dl). Start by putting these values into vectors:

sds <- seq(8, 16, by=2)
effects <- seq(12, 28, by=4)

There are 5 effect sizes and 5 standard deviations, so we'll compute a total of 25 statistical powers. We'll put these in a matrix with the rows being the standard deviations and the columns being the effect sizes. Start with an empty matrix:

powers <- matrix(
  dimnames = list(paste("sd:",sds,"mg/dl"), paste("Effect:", effects, "mg/dl")),
  nrow=length(sds), 
  ncol=length(effects)
)
powers
##              Effect: 12 mg/dl Effect: 16 mg/dl Effect: 20 mg/dl
## sd: 8 mg/dl                NA               NA               NA
## sd: 10 mg/dl               NA               NA               NA
## sd: 12 mg/dl               NA               NA               NA
## sd: 14 mg/dl               NA               NA               NA
## sd: 16 mg/dl               NA               NA               NA
##              Effect: 24 mg/dl Effect: 28 mg/dl
## sd: 8 mg/dl                NA               NA
## sd: 10 mg/dl               NA               NA
## sd: 12 mg/dl               NA               NA
## sd: 14 mg/dl               NA               NA
## sd: 16 mg/dl               NA               NA

To fill in the matrix, we need to compute the statistical power for each standard deviation and each effect size, given a sample size of 7 mice per group. For the first entry in the matrix, we could do

power.t.test(n=7, sd=8, delta=12)
## 
##      Two-sample t test power calculation 
## 
##               n = 7
##           delta = 12
##              sd = 8
##       sig.level = 0.05
##           power = 0.7313279
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Note we can get the value of interest, power from here using

power.t.test(n=7, sd=8, delta=12)$power
## [1] 0.7313279

To put this into the matrix, we can do

rowname <- paste("sd:", 8, "mg/dl")
colname <- paste("Effect:", 12, "mg/dl")
powers[rowname, colname] <- power.t.test(n=7, sd=8, delta=12)$power
powers
##              Effect: 12 mg/dl Effect: 16 mg/dl Effect: 20 mg/dl
## sd: 8 mg/dl         0.7313279               NA               NA
## sd: 10 mg/dl               NA               NA               NA
## sd: 12 mg/dl               NA               NA               NA
## sd: 14 mg/dl               NA               NA               NA
## sd: 16 mg/dl               NA               NA               NA
##              Effect: 24 mg/dl Effect: 28 mg/dl
## sd: 8 mg/dl                NA               NA
## sd: 10 mg/dl               NA               NA
## sd: 12 mg/dl               NA               NA
## sd: 14 mg/dl               NA               NA
## sd: 16 mg/dl               NA               NA

We need to do that 25 times. Instead of having 25 lines of code, we can use for loops:

for (sd in sds) {
  for (effect in effects) {
    rowname <- paste("sd:", sd, "mg/dl")
    colname <- paste("Effect:", effect, "mg/dl")
    powers[rowname, colname] <- power.t.test(n=7, sd=sd, delta=effect)$power
  }
}

This gives

as_tibble(powers, rownames="SD")

We can write this out to a CSV file, which we might import into a grant application:

write.csv(powers, "PowerCalculation7Mice.csv")