---
title: "RIBO-Seq Workflow Template" 
author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" 
output:
  BiocStyle::html_document:
    toc_float: true
    code_folding: show
  BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
  %\VignetteIndexEntry{RIBO-Seq Workflow Template}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
---

```{css, echo=FALSE}
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
```

<!--
- Compile from command-line
Rscript -e "rmarkdown::render('systemPipeRIBOseq.Rmd', c('BiocStyle::html_document'), clean=F); knitr::knit('systemPipeRIBOseq.Rmd', tangle=TRUE)"; Rscript ../md2jekyll.R systemPipeRIBOseq.knit.md 6; Rscript -e "rmarkdown::render('systemPipeRIBOseq.Rmd', c('BiocStyle::pdf_document'))"
-->

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
    eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
    cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), 
    tidy.opts=list(width.cutoff=60), tidy=TRUE)
```

```{r setup, echo=FALSE, messages=FALSE, warnings=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
    library(BiocParallel)
    library(Biostrings)
    library(Rsamtools)
    library(GenomicRanges)
    library(ggplot2)
    library(GenomicAlignments)
    library(ShortRead)
    library(ape)
    library(batchtools)
})
```

# Ribo-Seq Workflow

This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`RIBO-Seq`_ data. The full workflow can be found here:
[HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.R).

## Loading package and workflow template

Load the _`RIBO-Seq`_ sample workflow into your current working directory.

```{r genRibo_workflow_single, eval=FALSE}
library(systemPipeRdata)
genWorkenvir(workflow="riboseq")
setwd("riboseq")
```

The working environment of the sample data loaded in the previous step contains the following preconfigured directory structure. Directory names are indicated in  <span style="color:grey">_**grey**_</span>. Users can change this structure as needed, but need to adjust the code in their workflows accordingly. 

* <span style="color:grey">_**riboseq/**_</span> 
    + This is the directory of the R session running the workflow.
    + Run script ( _\*.Rmd_) and sample annotation (_targets.txt_) files are located here.
    + Note, this directory can have any name (_e.g._ <span style="color:grey">_**riboseq**_</span>). Changing its name does not require any modifications in the run script(s).
    + Important subdirectories: 
        + <span style="color:grey">_**param/**_</span> 
            + Stores parameter files such as: _\*.param_, _\*.tmpl_ and _\*\_run.sh_.
        + <span style="color:grey">_**data/**_ </span>
            + FASTQ samples 
            + Reference FASTA file
            + Annotations
            + etc.
        + <span style="color:grey">_**results/**_</span>
            + Alignment, variant and peak files (BAM, VCF, BED) 
            + Tabular result files
            + Images and plots
            + etc.

The following parameter files are included in each workflow template: 

1. _`targets.txt`_: initial one provided by user; downstream _`targets_*.txt`_ files are generated automatically
2. _`*.param`_: defines parameter for input/output file operations, _e.g._ _`trim.param`_, _`bwa.param`_, _`hisat2.param`_, ...
3. _`*_run.sh`_: optional bash script, _e.g._: _`gatk_run.sh`_
4. Compute cluster environment (skip on single machine):
    + _`.batchtools.conf.R`_: defines type of scheduler for _`batchtools`_. Note, it is necessary to point the right template accordingly to the cluster in use.
    + _`*.tmpl`_: specifies parameters of scheduler used by a system, _e.g._ Torque, SGE, Slurm, etc.

## Run workflow

Next, run the chosen sample workflow _`systemPipeRIBOseq`_ ([.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRIBOseq.Rmd)) by executing from the command-line _`make -B`_ within the _`ribseq`_ directory. Alternatively, one can run the code from the provided _`*.Rmd`_ template file from within R interactively. 

Workflow includes following steps:

1. Read preprocessing
    + Adaptor trimming and quality filtering
    + FASTQ quality report
2. Alignments: _`HISAT2`_ (or any other RNA-Seq aligner)
3. Alignment stats
4. Compute read distribution across genomic features
5. Adding custom features to workflow (e.g. uORFs)
6. Genomic read coverage along transcripts
7. Read counting 
8. Sample-wise correlation analysis
9. Analysis of differentially expressed genes (DEGs)
10. GO term enrichment analysis
11. Gene-wise clustering
12. Differential ribosome binding (translational efficiency)

### Render report in HTML and PDF format

```{r render_report, eval=FALSE}
rmarkdown::render("systemPipeRIBOseq.Rmd", "html_document")
rmarkdown::render("systemPipeRIBOseq.Rmd", "pdf_document")
```

# Version Information

```{r sessionInfo}
sessionInfo()
```

# Funding

This research was funded by National Science Foundation Grants IOS-0750811 and
MCB-1021969, and a Marie Curie European Economic Community Fellowship
PIOF-GA-2012-327954.