---
title: "Protein workflow"
author: "Petra Palenikova & Rick Scavetta"
date: "`r Sys.Date()`"
output: 
    rmarkdown::html_vignette:
        toc: true
vignette: >
    %\VignetteIndexEntry{Protein workflow}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    fig.width=7, fig.height=4.5,
    collapse = TRUE,
    eval = TRUE,
    comment = "#>"
)
```

# Introduction

Complexome profiling or complexomics is a mass spectrometry-based method used in
biology to study macromolecular complexes in their native form. Protein 
complexes are assessed by evaluating their migration profile across fractions. 
First, lysed protein sample is separated into fractions, typically by blue 
native electrophoresis (BN-PAGE) or density gradient centrifugation. Individual 
fractions are analysed by protein mass spectrometry. This allows to identify 
protein migration profiles across the fractions and to assess protein 
co-migration. It is often desirable to compare co-migration between multiple 
biological samples. Typically, this would require analysing each sample 
separately, using multiple lanes of blue native gel/multiple density gradients. 
However, this approach might introduce technical biases making the qualitative 
comparison of migration profiles as well as quantitative comparison of protein 
amount between different biological samples difficult. To mitigate these 
technical biases, biological samples can be labeled by means detectable by 
mass spectrometry (e.g. SILAC, TMT, iTRAQ) and analysed simultaneously.  Stable 
Isotope Labelling with Amino acids in Cell culture (SILAC) is a method when 
cells are grown in the presence of amino acids with either low natural abundance
“heavy” isotopes of carbon and nitrogen or the most frequent “light” isotopes.
This labelling allows for reciprocally labelled samples to be mixed and 
multiplexed at the very early steps of experiment, making it a useful tool when 
experimental design requires comparison of 2 biological samples. 

Here we present a package to analyse data produced by SILAC complexomics 
experiments. This package does not interpret raw mass spectrometry data. Protein
workflow of this package uses normalised protein data as an input. It allows to 
performs cluster analysis and contains tools for visualization of results. The
analysis is indented for samples that were SILAC labelled, therefore the input 
file should contain both “heavy” and “light” proteins.

## Method description

### 0) Re-formatting of input file

Input file format was designed so it is easily readable by human. Downstream 
analysis requires a slightly different format so it is necessary to perfomr 
this change before performing the analysis.

### 1) Hierarchical clustering

Clustering allows to identify similarity between migration profiles of proteins 
in an unbiased way. We can check co-migration of known protein complexes by 
simply filtering the data, however, clustering provides additional information
by allowing to identify unknown proteins that show similar migration profile as 
our proteins of interest.

This package contains functions to perform hierarchical clustering using Pearson
correlation (cantered or uncentered) as a distance measure and one of the three 
linkage methods (single, average or complete).

### 2) Export files and visualizations

We provide several functions to export intermediate steps of the analysis.
Plotting functionality includes: 

* proteinPlot - line plot for a selected protein 
* groupHeatMap - heatmap for a selected group of proteins 
* oneGroupTwoLabelsCoMigration - scatter plot for a selected group of proteins
* twoGroupsWithinLabelCoMigration - scatter plot for 2 selected groups of
proteins
* makeBarPlotClusterSummary - bar plot showing number of proteins per cluster

# Example protein workflow

**Read in data, convert to correct format**

```{r}
library(ComPrAn)
inputFile <- system.file("extData", "dataNormProts.txt", package = "ComPrAn")

forAnalysis <- protImportForAnalysis(inputFile)
```

### Visualization of normalised protein data
**Have a look at a selected protein (line plot)**
```{r}
protein <- "P52815"
max_frac <- 23
# example protein plot, quantitative comparison between labeled and unlabeled
# samples (default settings)
proteinPlot(forAnalysis[forAnalysis$scenario == "B",], protein, max_frac)
```

**Make a heatmap for a selected group of proteins**
```{r groupHeatMap, fig.width=7, fig.height=6.7}
groupDataFileName <- system.file("extData","exampleGroup.txt",package="ComPrAn")
groupName <- 'group1'
groupData <- data.table::fread(groupDataFileName)
# example heatmap, quantitative comparison between labeled and unlabeled samples
# (default settings)
groupHeatMap(dataFrame = forAnalysis[forAnalysis$scenario == "B",],
                groupData, groupName)
```

**Co-migration plot of single protein group between label states**
```{r}
groupDataVector <- c("Q16540","P52815","P09001","Q13405","Q9H2W6")
groupName <- 'group1' 
max_frac <- 23 
# example co-migration plot, non-quantitative comparison of migration profile 
# of a sigle protein goup between labeled and unlabeled samples 
# (default settings)
oneGroupTwoLabelsCoMigration(forAnalysis, max_frac = max_frac, 
                                groupDataVector,groupName)

```

**Co-migration plot of two protein groups within label state**
```{r}
group1DataVector <- c("Q16540","P52815","P09001","Q13405","Q9H2W6")
group1Name <- 'group1' 
group2DataVector <- c("Q9NVS2","Q9NWU5","Q9NX20","Q9NYK5","Q9NZE8")
group2Name <- 'group2'
max_frac <- 23 
# example co-migration plot, non-quantitative comparison of migration profile 
# of two protein goups within label states (default settings)
twoGroupsWithinLabelCoMigration(dataFrame = forAnalysis, max_frac = max_frac, 
                                group1Data = group1DataVector, 
                                group1Name = group1Name,
                                group2Data = group2DataVector, 
                                group2Name = group2Name)
```

### Cluster analysis

**Create components neccessary for clustering:**
(distance matrix for labeled and unlabeled samples,
protein table for both samples)
```{r}
clusteringDF <- clusterComp(forAnalysis,scenar = "A", PearsCor = "centered")
```

**Assign clusters to data frames**
```{r}
labTab_clust <- assignClusters(.listDf = clusteringDF,sample = "labeled",
                                    method = 'average', cutoff = 0.85)
unlabTab_clust <- assignClusters(.listDf = clusteringDF,sample = "unlabeled",
                                    method = 'average', cutoff = 0.85)
```

**Make bar plots** summarizing numbers of proteins per cluster for labeled
and unlabeled samples
```{r clusterBar, fig.width=4, fig.height=2.5}
makeBarPlotClusterSummary(labTab_clust, name = 'labeled')
makeBarPlotClusterSummary(unlabTab_clust, name = 'unlabeled')
```

**Create table containing proteins and their assigned clusters**
```{r}
tableForClusterExport <- exportClusterAssignments(labTab_clust,unlabTab_clust)
```

<center>
#### *End of file*
</center>