BMR 617: Statistical Techniques for the Biomedical Sciences

Principles for presenting statistical ananlyses - Part 2: Results and Graphics

Last time we discussed the methods section of a journal article, with respect to presenting statistical methods. The methods section should include details about which statistical analyses were used, and which software was used to implement them. Ultimately, the aim is that anyone should be able to reproduce the results given the raw data and the information in the article.

This time we'll look at presenting results in more detail. We'll continue with our example analysis:

	
library(tidyverse)
met <- read_csv("https://denvirlab.marshall.edu/BMR617-2022/data/TH-B6-metabolic.csv") %>%
  separate(MouseID, into=c("Strain", "Diet", "ID"), sep='-')

aov.full <- aov(Cholesterol ~ Strain * Diet, data = met)
aov.full$coefficients
confint(aov.full)
summary(aov.full)

png(filename = "figure1.png", width=4*960, height = 4*480, res=4*72)
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  xlab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")
dev.off()

sessionInfo()

The results section

The results section should include the results of the statistical analyses performed. This can be in the form of graphs, text, or both. In either case, the results of each statistical test should include:

The effect size. This is the difference in means between two groups (for a t-test, or post-hoc tests for an ANOVA), or the relative risk or odds ratio (for comparing proportions), or a correlation coefficient, $R^2$ value, or parameter estimate for a linear regression.
Confidence intervals for the effect sizes. State the level of confidence chosen (this will usually be 95%).
The p-value. Avoid simply characterizing the p-value as "significant" or "not significant"; always give an actual value. Include all p-values, even those not considered significant.

Always give sufficient information to make it clear what the results mean. For example, in an ANOVA, make sure it is clear which is the "reference" level.

Here are the results of the ANOVA:


> aov.full <- aov(Cholesterol ~ Strain * Diet, data = met)
> aov.full$coefficients
    (Intercept)        StrainTH          DietHF          DietLF StrainTH:DietHF StrainTH:DietLF 
      42.945000       58.290000       72.140998       38.783333        8.934004        7.221669 
> confint(aov.full)
                      2.5 %    97.5 %
(Intercept)      13.1490209  72.74098
StrainTH         19.8235566  96.75644
DietHF           32.1654973 112.11650
DietLF            0.3168892  77.24978
StrainTH:DietHF -49.1490529  67.01706
StrainTH:DietLF -45.5208619  59.96420
> summary(aov.full)
            Df Sum Sq Mean Sq F value   Pr(>F)    
Strain       1  19984   19984  24.082 5.87e-05 ***
Diet         2  25773   12887  15.529 5.40e-05 ***
Strain:Diet  2    102      51   0.062     0.94    
Residuals   23  19086     830                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The key points here are:

Strain has an effect on Cholesterol level, with TH mice having a higher cholesterol level than B6 mice.
Diet has an effect on Cholesterol level, with mice fed the LF diet having a higher Cholesterol level than mice fed the Chow diet, and mice fed the HF diet having a higher Cholesterol level than the mice fed the Chow diet.
There is no evidence of an interaction between Strain and Diet on the Cholesterol level.
All these effect can be quantified, and confidence levels computed.

Presenting results in the narrative

We can summarize the results above in a narrative. Each statement should be backed with quantities and confidence intervals. We should also include p-values. The following provides a concise, precise summary of the results:

Both Strain ($p=5.87\times10^{-5}$) and Diet ($p=5.40\times10^{-5}$) showed an effect on Cholesterol levels. The mean Cholesterol level for the control B6 mice fed the Chow diet was 42.95mg/dl (95% CI [13.15, 72.74]). TH mice had Cholesterol levels an average of 58.28mg/dl (95% CI [19.82, 96.76]) higher than B6 mice. Mice fed the Low-fat, high-calorie ("LF") diet had cholesterol levels an average of 38.73mg/dl (95% CI [0.32, 77.25]) higher than those fed the Chow diet, while mice fed the High-Fat ("HF") diet had Cholesterol levels an average of 72.14mg/dl (95% CI [32.17, 112.12]) higher that those fed Chow. There was no apparent interaction between mouse Strain and Diet (p=0.94).

Note here that we include exact p-values (not just p<0.05 or p>0.05) and include the p-values for non-significant results. We also include confidence intervals each time we refer to an estimate.

Wrangling the results to a table

To create a table with the results, we need to combine different parts of the output from R into a single table. We can do this with a bit of effort and some data wrangling in R. We'd like to create a table with the parameters, estimates, and 95% confidence intervals.

The estimates are contained in aov.full$coefficients, as a named list.

The 95% confidence intervals are contained in confint(aov.full), which is a matrix. The row names of the matrix contain the parameter names.

We can start by converting the matrix containing the confidence intervals to a tidyverse data table:


resultsTable <- as_tibble(confint(aov.full), rownames="Parameter")

Now let's add a column for the estimate. We can turn the list aov.full$coeffcients into another data table, and bind its columns (cbind(...)) with our existing table:


resultsTable <- cbind(resultsTable, tibble(Estimate = aov.full$coefficients))

Let's tidy this up a bit. We can round the numerical columns to two digits, and combine the two columns representing the confidence interval into a single column:


resultsTable <- resultsTable %>%
  mutate(`2.5 %` = round(`2.5 %`, digits = 2),
         `97.5 %` = round(`97.5 %`, digits = 2),
         Estimate = round(Estimate, digits = 2)) %>%
  mutate(`95% Confidence Interval`=paste0("[", `2.5 %`, ", ", `97.5 %`, "]"))

and finally we can just select the columns we need:


resultsTable <- resultsTable %>%
  select(Parameter, Estimate, `95% Confidence Interval`)

We can save this as a CSV file and paste it into a Word document.

Outputting graphics

RStudio has options for exporting graphics directly. However, this will not give you publication-quality images. A better way is to export a figure to a graphics file from code.

The basic code structure is


png("ImageFilename.png", ...)
# Graphics commands
dev.off()

This will create an image file with the name ImageFilename.png. Any graphics commands will be written to an off-screen "graphics device", and when dev.off() ("device off") is called, all the graphics will be written to that file, and the file will be closed.

Controlling the quality

Computer graphics are represented by an array of individual dots, called pixels. Each pixel is a small rectangle in one solid color. Our aim is to generate images that look good on a screen and in print. On a screen, a user might zoom in to see more detail.

When we create the png file, we can specify the size in pixels. The default size is 480 by 480, which is not a very high resolution.


png("lowResChol.png", width=480, height=480)
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  ylab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")
dev.off()

If we increase the image size, we get a much higher-quality images:


png("hiResChol.png", width=8*480, height=8*480)
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  ylab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")
dev.off()

The problem now is that the text is too small to see. Text size is not measured in pixels, but in point size. Usually, there are 72 points to an inch, so these units are in "print size", not "image size". We can control this with the res parameter, which scales the point size, and effectively determines the number of points per pixel.


png("hiResChol.png", width=4*480, height=4*480, res=4*72)
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  ylab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")
dev.off()