---
title: "VAR-Seq Workflow Template"
author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`"
output:
BiocStyle::html_document:
toc_float: true
code_folding: show
BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{VAR-Seq Workflow Template}
%\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
---
```{css, echo=FALSE}
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
```
```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")),
tidy.opts=list(width.cutoff=60), tidy=TRUE)
```
```{r setup, echo=FALSE, messages=FALSE, warnings=FALSE}
suppressPackageStartupMessages({
library(systemPipeR)
library(BiocParallel)
library(Biostrings)
library(Rsamtools)
library(GenomicRanges)
library(ggplot2)
library(GenomicAlignments)
library(ShortRead)
library(ape)
library(batchtools)
})
```
# VAR-Seq Workflow
This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`VAR-Seq`_ data. The full workflow can be found here: [HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.R).
## Loading package and workflow template
Load the _`VAR-Seq`_ sample workflow into your current working directory.
```{r genVar_workflow_single, eval=FALSE}
library(systemPipeRdata)
genWorkenvir(workflow="varseq")
setwd("varseq")
```
The working environment of the sample data loaded in the previous step contains the following preconfigured directory structure. Directory names are indicated in _**grey**_. Users can change this structure as needed, but need to adjust the code in their workflows accordingly.
* _**varseq/**_
+ This is the directory of the R session running the workflow.
+ Run script ( _\*.Rmd_) and sample annotation (_targets.txt_) files are located here.
+ Note, this directory can have any name (_e.g._ _**varseq**_). Changing its name does not require any modifications in the run script(s).
+ Important subdirectories:
+ _**param/**_
+ Stores parameter files such as: _\*.param_, _\*.tmpl_ and _\*\_run.sh_.
+ _**data/**_
+ FASTQ samples
+ Reference FASTA file
+ Annotations
+ etc.
+ _**results/**_
+ Alignment, variant and peak files (BAM, VCF, BED)
+ Tabular result files
+ Images and plots
+ etc.
The following parameter files are included in each workflow template:
1. _`targets.txt`_: initial one provided by user; downstream _`targets_*.txt`_ files are generated automatically
2. _`*.param`_: defines parameter for input/output file operations, _e.g._ _`trim.param`_, _`bwa.param`_, _`hisat2.param`_, ...
3. _`*_run.sh`_: optional bash script, _e.g._: _`gatk_run.sh`_
4. Compute cluster environment (skip on single machine):
+ _`.batchtools.conf.R`_: defines type of scheduler for _`batchtools`_. Note, it is necessary to point the right template accordingly to the cluster in use.
+ _`*.tmpl`_: specifies parameters of scheduler used by a system, _e.g._ Torque, SGE, Slurm, etc.
## Run workflow
Next, run the chosen sample workflow _`systemPipeVARseq`_ ([.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.Rmd)) by executing from the command-line _`make -B`_ within the _`varseq`_ directory. Alternatively, one can run the code from the provided _`*.Rmd`_ template file from within R interactively.
Workflow includes following steps:
1. Read preprocessing
+ Quality filtering (trimming)
+ FASTQ quality report
2. Alignments: _`gsnap`_, _`bwa`_
3. Variant calling: _`VariantTools`_, _`GATK`_, _`BCFtools`_
4. Variant filtering: _`VariantTools`_ and _`VariantAnnotation`_
5. Variant annotation: _`VariantAnnotation`_
6. Combine results from many samples
7. Summary statistics of samples
# Version Information
```{r sessionInfo}
sessionInfo()
```
# Funding
This project was supported by funds from the National Institutes of
Health (NIH) and the National Science Foundation (NSF).