--- title: "Generic Workflow Template" author: "Author: FirstName LastName" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document: toc_float: true code_folding: show BiocStyle::pdf_document: default package: systemPipeR vignette: | %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{WF: Basic Generic Template} %\VignetteEngine{knitr::rmarkdown} fontsize: 14pt bibliography: bibtex.bib --- ```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; } ``` ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() options(width=60, max.print=1000) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), tidy.opts=list(width.cutoff=60), tidy=TRUE) ``` ```{r setup, echo=FALSE, message=FALSE, warning=FALSE, eval=FALSE} suppressPackageStartupMessages({ library(systemPipeR) }) ``` # Workflow environment This is a _Generic_ workflow template for building new workflows. It is provided by [systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/html/systemPipeRdata.html), a companion package to [systemPipeR](https://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html) [@H_Backman2016-bt]. Similar to other `systemPipeR` workflow templates, a single command generates the necessary working environment. This includes the expected [directory structure](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#3_Directory_structure) for executing `systemPipeR` workflows and parameter files for running command-line (CL) software utilized in specific analysis steps. In-depth information can be found in the main vignette of [systemPipeRdata](https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). The _Generic_ template presented here is special that it provides a workflow skelleton intended to be used as a starting point for building new workflows. Basic workflow steps are included to illustrate how to design command-line (CL) and R-based workflow steps, as well as R Markdown code chunks that are not part of a workflow. For more comprehensive information on designing and executing workflows, users want to refer to the main vignettes of [systemPipeR](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html) and [systemPipeRdata](https://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). The details about contructing workflow steps are explained in the [Detailed Tutorial](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) section of `systemPipeR's` main vignette that uses the same workflow steps as the _Generic_ workflow template. The `Rmd` file (`new.Rmd`) associated with this vignette serves a dual purpose. It acts both as a template for executing the workflow and as a template for generating a reproducible scientific analysis report. Thus, users want to customize the text (and/or code) of this or other `systemPipeR` workflow vignettes to describe their experimental design and analysis results. This typically involves deleting the instructions how to work with this workflow, and customizing the text describing experimental designs, other metadata and analysis results. The `Generic` workflow template includes the following four data processing steps. 1. R step: export tabular data to files 2. CL step: compress files 3. CL step: uncompress files 4. R step: import files and plot summary statistics The topology graph of this workflow template is shown in Figure 1. ```{r spblast-toplogy, eval=TRUE, warning= FALSE, echo=FALSE, out.width="100%", fig.align = "center", fig.cap= "Topology graph of this workflow template.", warning=FALSE} knitr::include_graphics("results/plotwf_new.png") ``` ## Create workflow environment The environment of the chosen workflow is generated with the `genWorenvir` function. After this, the user’s R session needs to be directed into the resulting directory (here `new`). ```{r genNew_wf, eval=FALSE} systemPipeRdata::genWorkenvir(workflow = "new", mydirname = "new") setwd("new") ``` The `SPRproject` function initializes a new workflow project instance. This function call creates a an empty `SAL` workflow container and at the same time a linked project log directory (default name `.SPRproject`) that acts as a flat-file database of a workflow. For additional details, please visit this [section](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) in `systemPipeR's` main vignette. ```{r create_workflow, message=FALSE, eval=FALSE} library(systemPipeR) sal <- SPRproject() sal ``` ## Construct workflow This section illustrates how to load the following five workflow steps into a `SAL` workflow container (`SYSargsList`) first one-by-one in interactive mode (see [here](#stepwise)) or with the `importWF` command (see [here](#importwf)), and then run the workflow with the `runWF` command. ### Step 1: Load packages {#stepwise} Next, the `systemPipeR` package needs to be loaded in a workflow. ```{r load_library, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise( code = { library(systemPipeR) }, step_name = "load_library" ) ``` After adding the R code, sal contains now one workflow step. ```{r view_sal, message=FALSE, eval=FALSE} sal ``` ### Step 2: Export tabular data to files This is the first data processing step. In this case it is an R step that uses the `LineWise` function to define the workflow step, and appends it to the `SAL` workflow container. ```{r export_iris, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise(code={ mapply( FUN = function(x, y) write.csv(x, y), x = split(iris, factor(iris$Species)), y = file.path("results", paste0(names(split(iris, factor(iris$Species))), ".csv")) ) }, step_name = "export_iris", dependency = "load_library" ) ``` ### Step 3: Compress data The following adds a CL step that uses the `gzip` software to compress the files that were generated in the previous step. ```{r gzip, eval=FALSE, spr=TRUE, spr.dep=TRUE} targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt", package = "systemPipeR") appendStep(sal) <- SYSargsList( targets = targetspath, dir = TRUE, wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml", dir_path = "param/cwl", inputvars = c(FileName = "_FILE_PATH_", SampleName = "_SampleName_"), step_name = "gzip", dependency = "export_iris" ) ``` ### Step 4: Uncompress data Next, the output files (here compressed `gz` files), that were generated by the previous `gzip` step, will be uncompressed in the current step with the `gunzip` software. ```{r gunzip, eval=FALSE, spr=TRUE} appendStep(sal) <- SYSargsList( targets = "gzip", dir = TRUE, wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml", dir_path = "param/cwl", inputvars = c(gzip_file = "_FILE_PATH_", SampleName = "_SampleName_"), rm_targets_col = "FileName", step_name = "gunzip", dependency = "gzip" ) ``` ### Step 5: Import tabular files and visualize data Imports the tabular files from the previous step back into R, performs some summary statistics and plots the results as bar diagrams. ```{r stats, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise(code={ # combine all files into one data frame df <- lapply(getColumn(sal, step="gunzip", 'outfiles'), function(x) read.delim(x, sep=",")[-1]) df <- do.call(rbind, df) # calculate mean and sd for each species stats <- data.frame(cbind(mean=apply(df[,1:4], 2, mean), sd=apply(df[,1:4], 2, sd))) stats$species <- rownames(stats) # plot plot <- ggplot2::ggplot(stats, ggplot2::aes(x=species, y=mean, fill=species)) + ggplot2::geom_bar(stat = "identity", color="black", position=ggplot2::position_dodge()) + ggplot2::geom_errorbar( ggplot2::aes(ymin=mean-sd, ymax=mean+sd), width=.2, position=ggplot2::position_dodge(.9) ) plot }, step_name = "stats", dependency = "gunzip", run_step = "optional" ) ``` ### Version Information ```{r sessionInfo, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise( code = { sessionInfo() }, step_name = "sessionInfo", dependency = "stats") ``` # Automated routine {#importwf} Once the above steps have been loaded into `sal`, the workflow can be executed from start to finish (or partially) with the `runWF` command. Subsequently, scientific and technical workflow reports can be generated with the `renderReport` and `renderLogs` functions, respectively. The following code section also demonstrates how the above workflow steps can be imported with the `importWF` function from the associated `Rmd` workflow script (here `new.Rmd`). Constructing workflow instances with this automated approach is usually preferred since it is much more convenient and reliable compared to the manual approach described earlier. __Note:__ To demonstrate the 'systemPipeR's' automation routines without regenerating a new workflow environment from scratch, the first line below uses the `overwrite=TRUE` option of the `SPRproject` function. This option is generally discouraged as it erases the existing workflow project and `sal` container. For information on resuming and restarting workflow runs, users want to consult the relevant section of the main vignette (see [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#10_Restarting_and_resetting_workflows).) ```{r , import_run_routine, eval=FALSE} sal <- SPRproject(overwrite = TRUE) # Avoid 'overwrite=TRUE' in real runs. sal <- importWF(sal, file_path = "new.Rmd") # Imports above steps from new.Rmd. sal <- runWF(sal) # Runs workflow. plotWF(sal) # Plots workflow topology graph sal <- renderReport(sal) # Renders scientific report. sal <- renderLogs(sal) # Renders technical report from log files. ``` ## CL tools used The `listCmdTools` (and `listCmdModules`) return the CL tools that are used by a workflow. To include a CL tool list in a workflow report, one can use the following code. Additional details on this topic can be found in the main vignette [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#111_Accessor_methods). ```{r list_tools} if(file.exists(file.path(".SPRproject", "SYSargsList.yml"))) { local({ sal <- systemPipeR::SPRproject(resume = TRUE) systemPipeR::listCmdTools(sal) systemPipeR::listCmdModules(sal) }) } else { cat(crayon::blue$bold("Tools and modules required by this workflow are:\n")) cat(c("gzip", "gunzip"), sep = "\n") } ``` ## Session Info This is the session information that will be included when rendering this report. ```{r report_session_info, eval=TRUE} sessionInfo() ``` # References