R
authoring with markdown and knitr
Objectives
- Understand the concept of dynamic documents are reproducible research
- Learn
R
markdown basics- Produce a simple vignette
This content is adapted from the RStudio
R
Markdown - Dynamic Documents of R, Markdown basics and R code chunks tutorials.
This session introduces tools to author documents that include dynamically generated analysis results (tables, figures, …) produced with R
. Bringing data, results and their interpretation together in a single, coherent document is invaluable to keep track of long and complex analyses, assure reproducibility of the pipeline and the final report (any updates at the data or analysis level are propagated at the report level) and to comprehensively communicate these results to collaborators. A popular solution for this is literate programming, a technique and set of tools that permit to
R
code chunks; such documents are denoted R markdown documents and have the Rmd
extension. More on this in the next section.Steps 2 to 4 are can be executed individually or automated into a single command such as knitr::knit2html
(i.e. function knit2html
from the package knitr
) or rmarkdown::render
, or using the RStudio editor.
Other types of document and frameworks that combine a programming and authoring languages are Sweave
files (with Rnw
extension, that combine LaTeX and R
), Jupyter/IPython for python, R
and other languages, orgmode …
R
MarkdownR Markdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R. It combines the core syntax of markdown (an easy-to-write plain text format) with embedded R
code chunks. R
Markdown documents are fully reproducible (they can be automatically regenerated whenever underlying R
code or data changes).
This document describes R
Markdown v2 based on knitr and pandoc
, the workhorse that converts markdown to html and many other formats. We will focus the generation of reports such this document in html and pdf, although other formats and type of documents are available.
Note that PDF output requires a full installation of TeX and that pandoc is a third party application that needs to be installed outside of R
unless you use RStudio, which bundles all necessary R
packages and pandoc
.
Tip
We would also like to warn against using MS Word as output document, as this breaks the support for reproducibility. The final, compiled document should be used for rendering only (which is implicit for html of pdf files); editing should be performed on the original documents, i.e the
Rmd
file.
You can install the the required package from CRAN as follows:
install.packages("knitr")
install.packages("rmarkdown")
These packages are pre-installed with RStudio.
The figure below, taken from the RStudio markdown (v2) tutorial illustrates basic markdown syntax and its output using RStudio.
Section headers can be defined using ======
or -----
(level 1 and 2 respectively) or one or multiple #
(for level 1, 2, … respectively).
Italic and bold fonts are defined using one to two *
around the text.
Bullet lists items start with a -
.
In-line code and verbatim expression are surrounded by back ticks `
.
Code blocks start and end with three back ticks.
Starting a line with >
offsets the text.
R Markdown version 2 uses an optional header to define, among other things, the title, author and output formats of the R
Markdown document. Below, we want to use html
as final format; replace with pdf_document
to produce a pdf report.
---
title: "Title comes here"
author: "Your name"
date: "12 June 2015"
output: html_document
---
Rmd
to html
(or pdf
)If you are using RStudio, the simplest way to generate your final output is to open your Rmd
file and click the Knit HTML
(or Knit PDF
, …) button.
From R
, you can use the knitr::knit2html
or rmarkdown::render
functions and give the Rmd
source file as input.
Both options will first use the knitr::knit
function to weave the document and generate the markdown md
file that includes the code outputs.
The rendering of the final output document will be done using markdown::markdownToHTML
(in case of knitr::knit2html
), or the more recent rmarkdown::render
.
library("knitr")
knit2html("my_rr_document.Rmd")
library("rmarkdown")
render("my_rr_document.Rmd") ## default output is html
render("my_rr_document.Rmd", output_format = "html_document")
For pdf outputs using knitr
knit2pdf("my_rr_document.Rmd")
or rmakdown
library("rmarkdown")
render("my_rr_document.Rmd", output_format = "pdf_document")
And, to render all output formats defined in the header
render("my_rr_document.Rmd", output_format = "all")
Exercise: Experiment with
R
markdown and the features described so far. To create your starting document, create a newR Markdown
file using the RStudio menu or copy/paste the template below.
---
title: "Title comes here"
author: "Your name"
date: "12 June 2015"
output: html_document
---
This is an `R` Markdown document. Markdown is a simple formatting syntax
for authoring HTML, PDF, and MS Word documents. For more details on
using `R` Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that
includes both content as well as the output of any embedded `R` code
chunks within the document. You can embed an `R` code chunk like this:
You can use *
or _
to format italic and bold text.
*italic* **bold**
_italic_ __bold__
## Header 2
### Header 3
Unordered List:
* Item 1
* Item 2
+ Item 2a
+ Item 2b
Ordered List:
1. Item 1
2. Item 2
3. Item 3
+ Item 3a
+ Item 3b
To use links, enclose the link text in []
and the the actual link in ()
: [my link](http://linkurl.com)
or use a plain http address:
http://example.com
[linked phrase](http://example.com)
To add a static figure to the document, use the link syntax and precede it by !
: ![image text](./fig/myfig.png)
.
Image source can be on-line or local files.
![alt text](http://example.com/logo.png)
![alt text](figures/img.png)
A friend once said:
It’s always better to give than to receive.
A friend once said:
> It's always better to give than to receive.
Plain code blocks are displayed in a fixed-width font but not evaluated (see below for evaluation of code blocks), use 3 back ticks (see figure above)
This text is displayed verbatim / preformatted
We can also define in-line
code using single back ticks.
We can also define `in-line` code using single back ticks.
Three or more asterisks or dashes:
******
------
There is a simple markdown syntax to produce adequately formatted tables:
First Header | Second Header |
---|---|
Content Cell | Content Cell |
Content Cell | Content Cell |
Content Cell | Content Cell |
which is produced with
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
Content Cell | Content Cell
You can embed LaTeX or MathML equations in R
Markdown files using the following syntax:
$equation$
for inline equations (note there must not be white space adjacent to the $
delimiters)$$ equation $$
for display equations<math>...</math>
for MathML equations.For example:
H2O is a liquid. 210 is 1024.
H~2~O is a liquid. 2^10^ is 1024.
Exercise: Complement you
Rmd
file with some new syntax elements.
R
code chunksTo include R
code in the R
markdown file, the native code chunk syntax is augmented with code chunk tags inside {r, ...}
, as illustrated below:
The following code chunk options are available:
{r chunkname}
the first unnamed string is used to name the code chunk; useful for following the code execution and debugging.
{r, eval=TRUE}
by default the code in the chunk is executed. Alternatively, set eval=FALSE
.
{r, echo=TRUE}
by default, the code is displayed before the output. Use echo=FALSE
to hide the code chunk content.
Control if messages, warnings or errors are to be displayed with {r, message=TRUE, warning=TRUE, error=TRUE}
or FALSE
.
Figure dimensions can be controlled with fig.height
and fig.width
.
To avoid wasting time in repeating long calculations over and over again, it is possible to cache specific code chunks specifying cache=TRUE
in the chunk header.
To execute in-line code, use ` r 1+1`
(no space in front of the r
, though).
Tables can easily be printed inside an code chunk. Below, we explicitly create and use a data.frame
.
dfr <- data.frame(name = c("John", "David", "Caroline", "Igor"),
id = c(123, 234, 321, 231),
gender = c("M", "M", "F", "M"))
dfr
## name id gender
## 1 John 123 M
## 2 David 234 M
## 3 Caroline 321 F
## 4 Igor 231 M
Tables produced in R
as data frames or matrices can be rendered with the helper function knitr::kable
and are then displayed accordingly.
library("knitr")
kable(dfr)
name | id | gender |
---|---|---|
John | 123 | M |
David | 234 | M |
Caroline | 321 | F |
Igor | 231 | M |
Exercise: Using the
iris
data set, create a reproducible report that documents the data (dimensions, summary statistics, …) and provides a set of visualisations (a PCA plot,pairs
, …). To conclude your report, add a Session information section with the output ofsessionInfo()
.
knitr
package, including excellent documentation.markdown
and rmarkdown
packages## R version 3.2.0 Patched (2015-04-22 r68234)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets base
##
## other attached packages:
## [1] knitr_1.10.5 BiocStyle_1.7.3
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2 tools_3.2.0 htmltools_0.2.6
## [5] yaml_2.1.13 stringi_0.4-1 rmarkdown_0.6.1 highr_0.5
## [9] stringr_1.0.0 digest_0.6.8 evaluate_0.7