BMR 617: Statistical Techniques for the Biomedical Sciences

R Notebooks

So far, we've been creating R script files that run the code that perform our analyses. This is a good approach; the R script provides a record of exactly what analyses were performed. In theory, we can submit this as a supplemental file to a publication, and reviewers or readers will be able to see what we've done and judge its validity.

Last time, we saw some techniques for taking the results of the analysis performed by the R script and including them in the manuscript. These included writing image files, which we could import into a Word (or similar) document; creating tables and saving them as CSV files (which we could import into Excel, format visually, and copy into Word), etc. While these are workable (and very common), they are sometimes a bit unwieldy, which in turn can make them more error-prone than we would like.

R Notebooks are a technology that allows us to write documents with R code embedded in them. We can then "knit" the document together, which will run the R code and include the output (if we desire) of that code in the generated document. The "chunks" of R code can be configured so that the code itself can be included, or the output can be included, or both, or neither. Graphics can also be included in the output.

Markdown

R notebooks themselves are written in a flavor of Markdown, called (not unreasonably) "R Markdown". Markdown has its history in web development. Web pages are rendered from a language called "HTML" ("HyperText Markup Language"), which is a fairly simple, but very verbose, language. Because it is so verbose, it can be time consuming to create these documents. "Markdown" (the name is a play on "Markup"; it is "less than Markup") was created to be easy to write.

Experimenting with Notebooks

Open R studio. In the menu, click "File", "New File", and choose "R Notebook". This will open a new notebook, with some template code in it. Go to "File" and "Save as", and save the file as "Cholesterol.Rmd" in a location it's easy to find it (e.g. a folder you have for the coursework, or your Desktop, etc.)

At line 12, there is a "R chunk" defined. This starts with three backticks followed by "r" in curly braces: ```{r}, and ends with another three backticks.

The R chunk in RStudio has a green "Run" button. Press that button to run the chunk of code and see the output. (It will generate a graph.) Press the "x" button on the graph to remove it.

Press the "Preview" button. It should open a web document (HTML document) in a separate window (or in the viewer pane in the button right). Note that the graph does not appear.

Run the chunk of code again using the green "Run" button. Preview the file again; you should see the graph as part of the output. Previewing will not run code; it will display output only of code that has already been run.

Click the dropdown arrow next to "Preview" and choose "Knit to HTML". This will run all the code and generate the output. It will also create an HTML file, that could be directly uploaded to a web site.

Writing a Notebook

In the header section at the top, which is surrounded by lines with three minus signs, edit the title so you have


---
title: "Cholesterol levels of TALLYHO and C57BL/6 mice fed three different diets"
...
---

Delete everything below the header section.

Insert some text that briefly describes the experiment. (You can just insert some placeholder text if you like).

Click the "Insert" button and choose "R". This will create an R chunk. We need to load the tidyverse library and the data. Enter the following code inside the R chunk:


library(tidyverse)
met <- read_csv("https://denvirlab.marshall.edu/BMR617-2022/data/TH-B6-metabolic.csv") %>%
  separate(MouseID, into=c("Strain", "Diet", "ID"), sep='-') %>%
  mutate(Strain = factor(Strain), Diet=factor(Diet)) %>%
  select(Strain, Diet, Cholesterol)

Add the following text (or something similar) below the R chunk:

The data are shown in the following table and graph:

Insert another R chunk with the following:


met
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  ylab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")

Add some text below the R chunk describing what statistical test was done for the data. After that text, create another R chunk and add the R code:


aov.full <- aov(Cholesterol ~ Strain * Diet, data = met)
aov.full$coefficients
confint(aov.full)
summary(aov.full)

Examining the output

Press the "Knit" button, which should have replaced the "Preview" button. (If you still have a "Preview" button, press the dropdown button next to it and choose "Knit to HTML".) Examine the output.

Note how the table is displayed in HTML, with pagination.
Note how both the R code and the output are displayed, and how the formatting is different

Controlling the output from R code

In the first R chunk, where we load the tidyverse library and the data, the output are not particularly helpful. However, we might want to include the actual code, so that other users can see how we ran it and can run the same thing themselves.

In that first chunk of code, edit the opening line to read


```{r message=FALSE}```

Knit the document to HTML again, and look at the new output.

Conversely, we might want to use our notebook to generate output without including the code we used to create it. This is particularly true for figures and tables. Edit the second R chunk so it reads


```{r echo=FALSE}
met
ggplot(met, aes(x=Strain, y=Cholesterol, fill=Diet)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position=position_jitterdodge(jitter.width = 0.1, dodge.width = 0.75)) +
  ylab("Cholesterol Level (mg/dl)") +
  ggtitle("Cholesterol Level by Mouse Strain and Diet")
```

Knit the notebook again, and examine the output for the table and graph.

Examining other formats

Experiment with knitting to PDF and Word formats (use the dropdown button next to the "Knit" or "Preview" button).

Formatting text

Look at the R Markdown Cheatsheet. Section three shows ways to format the text. Experiment with making some text italic and/or bold, creating headers, lists, etc.

Are Notebooks the Future? Controversial

The is a small, but growing, group of scientists who content that the traditional process of peer-reviewed publication is not fully appropriate for the way science is conducted, and in particular consumed, in the modern age. In particular, the peer-reviewed journal process is designed around print documents, and most scientists rarely, if ever, still use print articles any more, reading articles online instead. However, online articles do not take advantage of the interactive nature of web pages, and do not use other techniques (such as version control) that are common in modern web applications. Advanced uses of notebooks can support interactivity, versioning, and other modern presentation techniques. This may be the future of publication.

It is important to note that there is still a need for some form of curation, or quality control, which is currently managed by the peer review process. It remains an ongoing problem as to how best achieve this in conjunction with modern, interactive, publication techniques. To date, most notebook-based publications are based around preprint servers (publications prior to peer review). A few online journals (for example F1000 Research) have built hybrid publication models using some of these techniques.

Watch this space! The COVID-19 pandemic, which highlighted the need to share research data at a much higher pace than the peer-review publication process allowed, probably accelerated our understanding of the benefits of moving to other publication models.