---
title: 'SWATH2stats example script'
output: 
  pdf_document: 
    keep_tex: yes
vignette: >
  %\VignetteIndexEntry{SWATH2stats example script}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

Example R code showing the usage of the SWATH2stats package. The data processed is the publicly available dataset of *S.pyogenes* (Röst et al. 2014) (http://www.peptideatlas.org/PASS/PASS00289). The results file 'rawOpenSwathResults\_1pcnt\_only.tsv' can be found on PeptideAtlas (ftp://PASS00289@ftp.peptideatlas.org/../Spyogenes/results/).
This is a R Markdown file, showing the result of processing this data. The lines shaded in grey represent the R code executed during this analysis.

The SWATH2stats package can be directly installed from Bioconductor using the commands below (http://bioconductor.org/packages/devel/bioc/html/SWATH2stats.html).


```{r, eval=FALSE}
if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("SWATH2stats")
```

# Part 1: Loading and annotation

Load the SWATH-MS example data from the package, this is a reduced file in order to limit the file size of the package. 

```{r}
library(SWATH2stats)
library(data.table)
data('Spyogenes', package = 'SWATH2stats')
```

Alternatively the original file downloaded from the Peptide Atlas can be loaded from the working directory.

```{r, eval=FALSE}
data <- data.frame(fread('rawOpenSwathResults_1pcnt_only.tsv', sep='\t', header=TRUE))
```

Extract the study design information from the file names. Alternatively, the study design table can  be provided as an external table.

```{r, tidy=TRUE}
Study_design <- data.frame(Filename = unique(data$align_origfilename))
Study_design$Filename <- gsub('.*strep_align/(.*)_all_peakgroups.*', '\\1', Study_design$Filename)
Study_design$Condition <- gsub('(Strep.*)_Repl.*', '\\1', Study_design$Filename)
Study_design$BioReplicate <- gsub('.*Repl([[:digit:]])_.*', '\\1', Study_design$Filename)
Study_design$Run <- seq_len(nrow(Study_design))
head(Study_design)
```

The SWATH-MS data is annotated using the study design table.
```{r}
data.annotated <- sample_annotation(data, Study_design, column_file = "align_origfilename")
```

Remove the decoy peptides for a subsequent inspection of the data.
```{r}
data.annotated.nodecoy <- subset(data.annotated, decoy==FALSE)
```


\newpage

# Part 2: Analyze correlation, variation and signal
Count the different analytes for the different injections.
```{r}
count_analytes(data.annotated.nodecoy)
```

Plot the correlation of the signal intensity.
```{r, fig.height=2.5, fig.width = 6}
correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'Intensity')
```

Plot the correlation of the delta_rt, which is the deviation of the retention time from the expected retention time.
```{r, fig.height=2.5, fig.width = 6}
correlation <- plot_correlation_between_samples(data.annotated.nodecoy, column.values = 'delta_rt')
```
\newpage 


Plot the variation of the signal across replicates.
```{r, fig.height=2.5, fig.width = 5.5}
variation <- plot_variation(data.annotated.nodecoy)
variation[[2]]
```

Plot the total variation versus variation within replicates.
```{r, fig.height=2.5, fig.width = 5.5}
variation_total <- plot_variation_vs_total(data.annotated.nodecoy)
variation_total[[2]]
```

Calculate the summed signal per peptide and protein across samples.
```{r}
peptide_signal <- write_matrix_peptides(data.annotated.nodecoy)
protein_signal <- write_matrix_proteins(data.annotated.nodecoy)
head(protein_signal)
```
\newpage


# Part 3: FDR estimation
Estimate the overall FDR across runs using a target decoy strategy.
```{r, fig.height = 3.5}
par(mfrow = c(1, 3))
fdr_target_decoy <- assess_fdr_overall(data.annotated, n_range = 10, 
                                       FFT = 0.25, output = 'Rconsole')

```
According to this FDR estimation one would need to filter the data with a lower mscore threshold to reach an overall protein FDR of 5%. 
```{r}
mscore4protfdr(data, FFT = 0.25, fdr_target = 0.05)
```

# Part 4: Filtering
Filter data for values that pass the 0.001 mscore criteria in at least two replicates of one condition.
```{r}
data.filtered <- filter_mscore_condition(data.annotated, 0.001, n_replica = 2)
```

Select only the 10 peptides showing strongest signal per protein.
```{r}
data.filtered2 <- filter_on_max_peptides(data.filtered, n_peptides = 10)
```
\newpage 


Filter for proteins that are supported by at least two peptides.
```{r}
data.filtered3 <- filter_on_min_peptides(data.filtered2, n_peptides = 2)
```


# Part 5: Conversion
Convert the data into a transition-level format (one row per transition measured). 

```{r}
data.transition <- disaggregate(data.filtered3)
```

Convert the data into the format required by MSstats.
```{r}
MSstats.input <- convert4MSstats(data.transition)
head(MSstats.input)
```

Convert the data into the format required by mapDIA.
```{r}
mapDIA.input <- convert4mapDIA(data.transition)
head(mapDIA.input)
```

Convert the data into the format required by aLFQ.
```{r}
aLFQ.input <- convert4aLFQ(data.transition)
head(aLFQ.input)
```

Session info on the R version and packages used.
```{r}
sessionInfo()
```