---
title: "Generic Workflow Template" 
author: "Author: FirstName LastName"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" 
output:
  BiocStyle::html_document:
    toc_float: true
    code_folding: show
  BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{WF: Basic Generic Template}
  %\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
---

```{css, echo=FALSE}
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
```

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
    eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
    cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), 
    tidy.opts=list(width.cutoff=60), tidy=TRUE)
```

```{r setup, echo=FALSE, message=FALSE, warning=FALSE, eval=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
})
```

# Workflow environment

This is a _Generic_ workflow template for building new workflows. It is provided by
[systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/html/systemPipeRdata.html), 
a companion package to [systemPipeR](https://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html) [@H_Backman2016-bt].
Similar to other `systemPipeR` workflow templates, a single command generates
the necessary working environment. This includes the expected [directory structure](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#3_Directory_structure) 
for executing `systemPipeR` workflows and parameter files for running
command-line (CL) software utilized in specific analysis steps. 
In-depth information can be found in the main vignette of [systemPipeRdata](https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). 
The _Generic_ template presented here is special that it provides a workflow
skelleton intended to be used as a starting point for building new workflows.
Basic workflow steps are included to illustrate how to design command-line (CL)
and R-based workflow steps, as well as R Markdown code chunks that are not part
of a workflow. For more comprehensive information on designing
and executing workflows, users want to refer to the main vignettes of
[systemPipeR](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html)
and
[systemPipeRdata](https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). 
The details about contructing workflow steps are explained in the 
[Detailed Tutorial](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) section 
of `systemPipeR's` main vignette that uses the same workflow steps as the _Generic_ workflow template.

The `Rmd` file (`new.Rmd`) associated with this vignette serves a dual purpose.
It acts both as a template for executing the workflow and as a template for
generating a reproducible scientific analysis report. Thus, users want to
customize the text (and/or code) of this or other `systemPipeR` workflow vignettes to describe their
experimental design and analysis results. This typically involves deleting the
instructions how to work with this workflow, and customizing the text
describing experimental designs, other metadata and analysis results.

The `Generic` workflow template includes the following four data processing steps.

1. R step: export tabular data to files 
2. CL step: compress files
3. CL step: uncompress files 
4. R step: import files and plot summary statistics

The topology graph of this workflow template is shown in Figure 1.

```{r spblast-toplogy, eval=TRUE, warning= FALSE, echo=FALSE, out.width="100%", fig.align = "center", fig.cap= "Topology graph of this workflow template.", warning=FALSE}
knitr::include_graphics("results/plotwf_new.png")
```

## Create workflow environment

The environment of the chosen workflow is generated with the `genWorenvir` 
function. After this, the user’s R session needs to be directed into the resulting directory
(here `new`).

```{r genNew_wf, eval=FALSE}
systemPipeRdata::genWorkenvir(workflow = "new", mydirname = "new")
setwd("new")
```

The `SPRproject` function initializes a new workflow project instance. This function
call creates a an empty `SAL` workflow container and at the same time a
linked project log directory (default name `.SPRproject`) that acts as a flat-file 
database of a workflow. For additional details, please visit this
[section](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial)
in `systemPipeR's` main vignette.  

```{r create_workflow, message=FALSE, eval=FALSE}
library(systemPipeR)
sal <- SPRproject()
sal
```

## Construct workflow

This section illustrates how to load the following five workflow steps into a
`SAL` workflow container (`SYSargsList`) first one-by-one in interactive mode
(see [here](#stepwise)) or with the `importWF` command (see [here](#importwf)),
and then run the workflow with the `runWF` command. 


### Step 1: Load packages {#stepwise}

Next, the `systemPipeR` package needs to be loaded in a workflow. 

```{r load_library, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(
    code = {
    library(systemPipeR)
    }, 
    step_name = "load_library"
)
```

After adding the R code, sal contains now one workflow step.

```{r view_sal, message=FALSE, eval=FALSE}
sal
```

### Step 2: Export tabular data to files

This is the first data processing step. In this case it is an R step that uses the `LineWise` 
function to define the workflow step, and appends it to the `SAL` workflow container.

```{r export_iris, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(code={
    mapply(
      FUN = function(x, y) write.csv(x, y),
      x = split(iris, factor(iris$Species)),
      y = file.path("results", paste0(names(split(iris, factor(iris$Species))), ".csv"))
    )
    }, 
  step_name = "export_iris", 
  dependency = "load_library"
)
```

### Step 3: Compress data

The following adds a CL step that uses the `gzip` software to compress the files that were 
generated in the previous step.

```{r gzip, eval=FALSE, spr=TRUE, spr.dep=TRUE}
targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt", package = "systemPipeR")
appendStep(sal) <- SYSargsList(
    targets = targetspath, dir = TRUE,
    wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml",
    dir_path = "param/cwl",
    inputvars = c(FileName = "_FILE_PATH_", SampleName = "_SampleName_"), 
    step_name = "gzip", 
    dependency = "export_iris"
)
```

### Step 4: Uncompress data

Next, the output files (here compressed `gz` files), that were generated by the
previous `gzip` step, will be uncompressed in the current step with the `gunzip`
software. 

```{r gunzip, eval=FALSE, spr=TRUE}
appendStep(sal) <- SYSargsList(
    targets = "gzip", dir = TRUE,
    wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml",
    dir_path = "param/cwl",
    inputvars = c(gzip_file = "_FILE_PATH_", SampleName = "_SampleName_"), 
    rm_targets_col = "FileName", 
    step_name = "gunzip", 
    dependency = "gzip"
)
```

### Step 5: Import tabular files and visualize data

Imports the tabular files from the previous step back into R, performs some summary 
statistics and plots the results as bar diagrams.

```{r stats, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(code={
    # combine all files into one data frame
    df <- lapply(getColumn(sal, step="gunzip", 'outfiles'), function(x) read.delim(x, sep=",")[-1])
    df <- do.call(rbind, df)
    # calculate mean and sd for each species
    stats <- data.frame(cbind(mean=apply(df[,1:4], 2, mean), sd=apply(df[,1:4], 2, sd)))
    stats$species <- rownames(stats)
    # plot
    plot <- ggplot2::ggplot(stats, ggplot2::aes(x=species, y=mean, fill=species)) + 
        ggplot2::geom_bar(stat = "identity", color="black", position=ggplot2::position_dodge()) +
        ggplot2::geom_errorbar(
            ggplot2::aes(ymin=mean-sd, ymax=mean+sd), 
            width=.2,
            position=ggplot2::position_dodge(.9)
        )
    plot
    }, 
    step_name = "stats", 
    dependency = "gunzip", 
    run_step = "optional"
)
```

### Version Information

```{r sessionInfo, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(
    code = {
    sessionInfo()
    }, 
    step_name = "sessionInfo", 
    dependency = "stats")
```

# Automated routine {#importwf}

Once the above steps have been loaded into `sal`, the workflow can be executed from start to
finish (or partially) with the `runWF` command. Subsequently, scientific and technical workflow 
reports can be generated with the `renderReport`  and `renderLogs` functions, respectively.

The following code section also demonstrates how the above workflow steps can be imported with 
the `importWF` function from the associated `Rmd` workflow script (here `new.Rmd`). Constructing 
workflow instances with this automated approach is usually preferred since it is much more convenient 
and reliable compared to the manual approach described earlier. 

__Note:__ To demonstrate the 'systemPipeR's' automation routines without regenerating a new workflow 
environment from scratch, the first line below uses the `overwrite=TRUE` option of the `SPRproject` function. 
This option is generally discouraged as it erases the existing workflow project and `sal` container. 
For information on resuming and restarting workflow runs, users want to consult the relevant section of 
the main vignette (see [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#10_Restarting_and_resetting_workflows).)

```{r , import_run_routine, eval=FALSE}
sal <- SPRproject(overwrite = TRUE) # Avoid 'overwrite=TRUE' in real runs.
sal <- importWF(sal, file_path = "new.Rmd") # Imports above steps from new.Rmd.
sal <- runWF(sal) # Runs workflow.
plotWF(sal) # Plots workflow topology graph
sal <- renderReport(sal) # Renders scientific report.
sal <- renderLogs(sal) # Renders technical report from log files.
```

## CL tools used 
The `listCmdTools` (and `listCmdModules`) return the CL tools that 
are used by a workflow. To include a CL tool list in a workflow report, 
one can use the following code. Additional details on this topic 
can be found in the main vignette [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#111_Accessor_methods).

```{r list_tools}
if(file.exists(file.path(".SPRproject", "SYSargsList.yml"))) {
    local({
        sal <- systemPipeR::SPRproject(resume = TRUE)
        systemPipeR::listCmdTools(sal)
        systemPipeR::listCmdModules(sal)
    })
} else {
    cat(crayon::blue$bold("Tools and modules required by this workflow are:\n"))
    cat(c("gzip", "gunzip"), sep = "\n")
}
```

## Session Info

This is the session information that will be included when rendering this report. 

```{r report_session_info, eval=TRUE}
sessionInfo()
```

# References