Statistical analyses are an important part of any experiment or study. The same principles that apply to other parts of the scientific process also apply to the statistical analyses. In particular:
Journal articles are typically presented in three main sections:
Do not try to interpret the results in the results section. Biological implications of the results obtained should be in the discussion section.
library(tidyverse)
met <- read_csv("https://denvirlab.marshall.edu/BMR617-2022/data/TH-B6-metabolic.csv") %>%
separate(MouseID, into=c("Strain", "Diet", "ID"), sep='-')
aov.full <- aov(Cholesterol ~ Strain * Diet, data = met)
aov.full$coefficients
confint(aov.full)
summary(aov.full)
png(filename = "figure1.png", width=4*960, height = 4*480, res=4*72)
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
geom_boxplot(outlier.shape = NA) +
geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
xlab("Cholesterol Level (mg/dl)") +
ggtitle("Cholesterol Level by Mouse Strain and Diet")
dev.off()
sessionInfo()
Here is the console output:
> library(tidyverse)
> met <- read_csv("https://denvirlab.marshall.edu/BMR617-2022/data/TH-B6-metabolic.csv") %>%
+ separate(MouseID, into=c("Strain", "Diet", "ID"), sep='-')
Rows: 29 Columns: 7
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): MouseID
dbl (6): BodyWeight, Insulin, TG, Cholesterol, Glucose, FatMass
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
>
> aov.full <- aov(Cholesterol ~ Strain * Diet, data = met)
> aov.full$coefficients
(Intercept) StrainTH DietHF DietLF StrainTH:DietHF StrainTH:DietLF
42.945000 58.290000 72.140998 38.783333 8.934004 7.221669
> confint(aov.full)
2.5 % 97.5 %
(Intercept) 13.1490209 72.74098
StrainTH 19.8235566 96.75644
DietHF 32.1654973 112.11650
DietLF 0.3168892 77.24978
StrainTH:DietHF -49.1490529 67.01706
StrainTH:DietLF -45.5208619 59.96420
> summary(aov.full)
Df Sum Sq Mean Sq F value Pr(>F)
Strain 1 19984 19984 24.082 5.87e-05 ***
Diet 2 25773 12887 15.529 5.40e-05 ***
Strain:Diet 2 102 51 0.062 0.94
Residuals 23 19086 830
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> png(filename = "figure1.png", width=4*960, height = 4*480, res=4*72)
> ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
+ geom_boxplot(outlier.shape = NA) +
+ geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
+ xlab("Cholesterol Level (mg/dl)") +
+ ggtitle("Cholesterol Level by Mouse Strain and Diet")
> dev.off()
RStudioGD
2
>
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.0.2 tidyr_1.1.4 tibble_3.1.5
[8] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 haven_2.4.3 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0 yaml_2.2.1 utf8_1.2.2
[8] rlang_0.4.11 pillar_1.6.3 glue_1.4.2 withr_2.4.2 DBI_1.1.1 bit64_4.0.5 dbplyr_2.1.1
[15] modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.1 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.2
[22] labeling_0.4.2 tzdb_0.1.2 parallel_4.0.3 curl_4.3.2 fansi_0.5.0 broom_0.7.9 Rcpp_1.0.7
[29] scales_1.1.1 backports_1.2.1 vroom_1.5.5 jsonlite_1.7.2 farver_2.1.0 fs_1.5.0 bit_4.0.4
[36] digest_0.6.28 hms_1.1.1 stringi_1.7.5 grid_4.0.3 cli_3.0.1 tools_4.0.3 magrittr_2.0.1
[43] crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.2 reprex_2.0.1 lubridate_1.8.0 assertthat_0.2.1
[50] httr_1.4.2 rstudioapi_0.13 R6_2.5.1 compiler_4.0.3
For the methods, we need to state the analysis that was performed.
What we actually performed was a two-way ANOVA with interaction. That
should be the first thing we state. The sessionInfo() command
gives us details of the version of R and any packages we are using.
The effect of mouse strain and diet on cholesterol level was assessed using a two-way ANOVA with interaction.
All statistical analyses were performed using the statistical computing environment R [1], version 4.0.3, with the "tidyverse" package [2], version 1.3.1.
To get the correct references, we can use the citation() function:
citation()
citation("tidyverse")
So here our references would be
The R code above generates a PNG graphics file:
The figure should have a legend that states simply what is represented.
If you choose to use bar charts, then the figure legend should state what the error bars represent.