---
title: "Data visualization"
author: 
  - Luca De Sano
  - Daniele Ramazzotti
  - Giulio Caravagna
  - Alex Graudenxi
  - Marco Antoniotti
date: "`r format(Sys.time(), '%B %d, %Y')`"
graphics: yes
package: TRONCO
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{Data visualization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{TRONCO,BiocStyle}
---


```{r, include=FALSE}
library(TRONCO)
data(aCML)
data(crc_maf)
data(crc_gistic)
data(crc_plain)
```

All examples in this section will be done with the the aCML dataset as reference.

## Summary report for a dataset and boolean queries{#sec:view}

We use the function `view` to get a short summary of a dataset that we loaded in TRONCO; this function reports on the number of samples and  events, plus some meta information that could be displayed graphically.  


```{r}
view(aCML)
```

## Creating views with the `as` functions

Several functions are available to create views over a dataset, with a set of parameter which can constraint the view -- as in the SELECT/JOIN approaches in databases. In the following examples we  show their execution with the default parameters, but  shorten their output to make this document readable.

The main `as` functions are here documented.  `as.genotypes`, that we can use to get the matrix of *genotypes* that we imported.


```{r}
as.genotypes(aCML)[1:10,5:10]
```


Differently, `as.events` and `as.events.in.samples`, that show tables with the events that we are processing in  all dataset or in a specific sample that we want to examine.


```{r}
as.events(aCML)[1:5, ]
as.events.in.sample(aCML, sample = 'patient 2')
```


Concerning genes, `as.genes` shows the mnemonic names of the genes (or chromosomes, cytobands, etc.) that we included in our dataset.


```{r}
as.genes(aCML)[1:8]
```


And `as.types` shows the types of alterations (e.g., mutations, amplifications, etc.) that we have  find in our dataset, and  function `as.colors` shows the list of the  colors which are associated to each type.


```{r}
as.types(aCML)
as.colors(aCML)
```



A function `as.gene` can be used to display the alterations of a specific gene  across the samples


```{r}
head(as.gene(aCML, genes='SETBP1'))
```


Views over samples can be created as well. `as.samples` and `which.samples` list all the samples in the data, or return a list of samples that harbour a certain alteration. The former is 


```{r}
as.samples(aCML)[1:10]
```


and the latter is
 

```{r}
which.samples(aCML, gene='TET2', type='Nonsense point')
```


A slightly different function, which manipulates the data,  is `as.alterations`, which transforms a dataset with events of different type  to events of a unique type, labeled *Alteration*.


```{r}
dataset = as.alterations(aCML)
```

```{r}
view(dataset)
```

 
When samples are enriched with stage information function `as.stages` can be used to create a view over such table. Views over patterns can be created as well -- see Model Inference with CAPRI.

## Dataset size

A set of functions allow to get the number of genes, events, samples, types and patterns in a dataset.


```{r}
ngenes(aCML)
nevents(aCML)
nsamples(aCML)
ntypes(aCML)
npatterns(aCML)
```



## Oncoprints

Oncoprints are the most effective  data-visualization functions in TRONCO. These are heatmaps where rows represent variants, and columns samples (*the reverse* of the input format required by TRONCO), and are annotated and displayed/sorted to enhance which samples have which mutations etc. 

By default `oncoprint` will try to  sort samples and events to enhance exclusivity patterns among the events.


```{r, fig.width=6, fig.height=5, fig.cap="This plot gives a graphical visualization of the events that are in the dataset -- with a color per event type. It sorts samples to enhance exclusivity patterns among the events"}
oncoprint(aCML)
```


But the sorting mechanism is bypassed if one wants to cluster samples or events, or if one wants to split samples by cluster (not shown). In the clustering case, the ordering is given by the dendrograms. In this case we also show the annotation of some groups of events via parameter `gene.annot`.


```{r fig.width=5, fig.height=5, fig.cap="This plot gives a graphical visualization of the events that are in the dataset -- with a color per event type. It it clusters samples/events"}
oncoprint(aCML, 
    legend = FALSE, 
    samples.cluster = TRUE, 
    gene.annot = list(one = list('NRAS', 'SETBP1'), two = list('EZH2', 'TET2')),
    gene.annot.color = 'Set2',
    genes.cluster = TRUE)
```

Oncoprints can be annotated; a special type of annotation is given by stage data. As this is not available for the aCML dataset, we create it randomly, just for the sake of showing how the oncoprint is enriched with this information. This is the random stage map that we create -- if some samples had no stage a NA would be added automatically. 


```{r}
stages = c(rep('stage 1', 32), rep('stage 2', 32))
stages = as.matrix(stages)
rownames(stages) = as.samples(aCML)
dataset = annotate.stages(aCML, stages = stages)
has.stages(aCML)
head(as.stages(dataset))
```


The `as.stages` function can now be used to create a view over stages.

```{r}
head(as.stages(dataset))
```
After that the data is annotated via `annotate.stages` function, we can again plot an oncoprint -- which this time will detect that the dataset has also stages associated, and will diplay those 


```{r fig.width=6, fig.height=5}
oncoprint(dataset, legend = FALSE)
```


If one is willing to display samples grouped according to some variable, for instance after a sample clustering task, he can use `group.samples` parameter of `oncoprint` and that will override the mutual exclusivity ordering. Here, we make the trick of using the stages as if they were such clustering result.


```{r fig.width=6, fig.height=5, fig.cap="Example \texttt{oncoprint} output for aCML data with randomly annotated stages, in left, and samples clustered by group assignment in right -- for simplicity the group variable is again the stage annotation."}
oncoprint(dataset, group.samples = as.stages(dataset))
```
    
## Groups visualization (e.g., pathways)

TRONCO provides functions to visualize groups of events, which in this case are called pathways -- though this could be any group that one would like to define. Aggregation happens with the same rational as the `as.alterations` function, namely by merging the events in the group.

We make an example of a pathway called *MyPATHWAY* involving genes SETBP1, EZH2 and WT1; we want it to be colored in red, and we want to have the genotype of each event to be maintened in the dataset. We proceed as follows  (R's output is omitted).

```{r}
pathway = as.pathway(aCML,
    pathway.genes = c('SETBP1', 'EZH2', 'WT1'),
    pathway.name = 'MyPATHWAY',
    pathway.color = 'red',
    aggregate.pathway = FALSE)
```

Which we then visualize with an `oncoprint`

```{r onco-pathway, fig.width=6.5, fig.height=2, fig.cap="Oncoprint output of a custom pathway called MyPATHWAY involving genes SETBP1, EZH2 and WT1; the genotype of each event is shown."}
oncoprint(pathway, title = 'Custom pathway',  font.row = 8, cellheight = 15, cellwidth = 4)
```

In TRONCO there is also a function which creates the pathway view and the corresponding oncoprint to  multiple pathways, when these are given as a list. We make here a simple example of two custom pathways.


```{r fig.width=6.5, fig.height=1.8, fig.cap="Oncoprint output of a custom pair of pathways, with events shown"}
pathway.visualization(aCML, 
    pathways=list(P1 = c('TET2', 'IRAK4'),  P2=c('SETBP1', 'KIT')),        
    aggregate.pathways=FALSE,
    font.row = 8)
```

If we had to visualize just the signature of the pathway, we could set  `aggregate.pathways=T`.

```{r fig.width=6.5, fig.height=1, fig.cap="Oncoprint output of a custom pair of pathways, with events hidden"}
pathway.visualization(aCML, 
    pathways=list(P1 = c('TET2', 'IRAK4'),  P2=c('SETBP1', 'KIT')),
    aggregate.pathways = TRUE,
    font.row = 8)
```

The same operation could have been done using [WikiPathways](https://wikipathways.org). We can query WikiPathways and collect HGNC gene symbols and titles for pathways of interest as follows. (R's output is omitted).

```{r eval=FALSE}
library(rWikiPathways)
# quotes inside query to require both terms
my.pathways <- findPathwaysByText('SETBP1 EZH2 TET2 IRAK4 SETBP1 KIT')
human.filter <- lapply(my.pathways, function(x) x$species == "Homo sapiens")
my.hs.pathways <- my.pathways[unlist(human.filter)] 
# collect pathways idenifiers
my.wpids <- sapply(my.hs.pathways, function(x) x$id)

pw.title<-my.hs.pathways[[1]]$name
pw.genes<-getXrefList(my.wpids[1],"H") 
```

Now `pw.genes` and `pw.title` can be used as input for the function `as.pathway`.
It is also possible to view and edit these pathways at WikiPathways using the following commands to open tabs in your default browser.

```{r wikipathways, eval=FALSE}
browseURL(getPathwayInfo(my.wpids[1])[2])
browseURL(getPathwayInfo(my.wpids[2])[2])
browseURL(getPathwayInfo(my.wpids[3])[2])
```