--- title: "VAR-Seq Workflow Template" author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document: toc_float: true code_folding: show BiocStyle::pdf_document: default package: systemPipeR vignette: | %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{VAR-Seq Workflow Template} %\VignetteEngine{knitr::rmarkdown} fontsize: 14pt bibliography: bibtex.bib --- ```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; } ``` ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() options(width=60, max.print=1000) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), tidy.opts=list(width.cutoff=60), tidy=TRUE) ``` ```{r setup, echo=FALSE, messages=FALSE, warnings=FALSE} suppressPackageStartupMessages({ library(systemPipeR) library(BiocParallel) library(Biostrings) library(Rsamtools) library(GenomicRanges) library(ggplot2) library(GenomicAlignments) library(ShortRead) library(ape) library(batchtools) }) ``` # VAR-Seq Workflow This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`VAR-Seq`_ data. The full workflow can be found here: [HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.R). ## Loading package and workflow template Load the _`VAR-Seq`_ sample workflow into your current working directory. ```{r genVar_workflow_single, eval=FALSE} library(systemPipeRdata) genWorkenvir(workflow="varseq") setwd("varseq") ``` The working environment of the sample data loaded in the previous step contains the following preconfigured directory structure. Directory names are indicated in _**grey**_. Users can change this structure as needed, but need to adjust the code in their workflows accordingly. * _**varseq/**_ + This is the directory of the R session running the workflow. + Run script ( _\*.Rmd_) and sample annotation (_targets.txt_) files are located here. + Note, this directory can have any name (_e.g._ _**varseq**_). Changing its name does not require any modifications in the run script(s). + Important subdirectories: + _**param/**_ + Stores parameter files such as: _\*.param_, _\*.tmpl_ and _\*\_run.sh_. + _**data/**_ + FASTQ samples + Reference FASTA file + Annotations + etc. + _**results/**_ + Alignment, variant and peak files (BAM, VCF, BED) + Tabular result files + Images and plots + etc. The following parameter files are included in each workflow template: 1. _`targets.txt`_: initial one provided by user; downstream _`targets_*.txt`_ files are generated automatically 2. _`*.param`_: defines parameter for input/output file operations, _e.g._ _`trim.param`_, _`bwa.param`_, _`hisat2.param`_, ... 3. _`*_run.sh`_: optional bash script, _e.g._: _`gatk_run.sh`_ 4. Compute cluster environment (skip on single machine): + _`.batchtools.conf.R`_: defines type of scheduler for _`batchtools`_. Note, it is necessary to point the right template accordingly to the cluster in use. + _`*.tmpl`_: specifies parameters of scheduler used by a system, _e.g._ Torque, SGE, Slurm, etc. ## Run workflow Next, run the chosen sample workflow _`systemPipeVARseq`_ ([.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeVARseq.Rmd)) by executing from the command-line _`make -B`_ within the _`varseq`_ directory. Alternatively, one can run the code from the provided _`*.Rmd`_ template file from within R interactively. Workflow includes following steps: 1. Read preprocessing + Quality filtering (trimming) + FASTQ quality report 2. Alignments: _`gsnap`_, _`bwa`_ 3. Variant calling: _`VariantTools`_, _`GATK`_, _`BCFtools`_ 4. Variant filtering: _`VariantTools`_ and _`VariantAnnotation`_ 5. Variant annotation: _`VariantAnnotation`_ 6. Combine results from many samples 7. Summary statistics of samples # Version Information ```{r sessionInfo} sessionInfo() ``` # Funding This project was supported by funds from the National Institutes of Health (NIH) and the National Science Foundation (NSF).