Author: Zuguang Gu ( z.gu@dkfz.de )
Date: 2015-10-14
OncoPrint is a way to visualize 
multiple genomic alteration events by heatmap. Here the ComplexHeatmap package provides a oncoPrint() function.
Besides the default style which is provided by cBioPortal, there are
additional barplots at both sides of the heatmap which show numbers of different alterations for
each sample and for each gene. Also with the functionality of ComplexHeatmap, you can control oncoPrint with
more flexibilities.
There are two different forms of input data. The first is represented as a matrix in which element would include multiple alterations in a form of a complex string. In follow example, 'g1' in 's1' has two types of alterations which are 'snv' and 'indel'.
mat = read.table(textConnection(
"   s1  s2  s3
g1  snv;indel   snv indel
g2      snv;indel   snv
g3  snv     indel;snv"), row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
mat = as.matrix(mat)
mat
##    s1          s2          s3         
## g1 "snv;indel" "snv"       "indel"    
## g2 ""          "snv;indel" "snv"      
## g3 "snv"       ""          "indel;snv"
In this case, we need to define a function to extract different alteration types and pass the function
to get_type argument. The function should return a vector of alteration types.
For one gene in one sample, since different alteration types may be drawn into one same grid in the heatmap, 
we need to define how to add the graphics by self-defined functions.
Here if the graphics have no transparency, orders of how to add
graphics matters. In following example, snv are first drawn and then the indel. You can see rectangles
for indels are actually smaller than that for snvs so that you can visualiza both snvs and indels if they
are in a same grid. Names in the list of functions should correspond to the alteration types (here, snv and indel).
For the self-defined graphic function, there should be four arguments which are positions of the grids 
on the heatmap (x and y), and widths and heights of the grids (w and h).
Colors for different alterations are defined in col. It should be a named vector for which names correspond
to alteration types. It is used to generate the barplots and the legends.
library(ComplexHeatmap)
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
    alter_fun_list = list(
        snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "red", col = NA)),
        indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = "blue", col = NA))
    ), col = c(snv = "red", indel = "blue"))
The second type of input data is a list of matrix for which each matrix contains binary value representing whether the alteration is absent or present. The list should have names which correspond to the alteration types.
mat_list = list(snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
                indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
rownames(mat_list$snv) = rownames(mat_list$indel) = c("g1", "g2", "g3")
colnames(mat_list$snv) = colnames(mat_list$indel) = c("s1", "s2", "s3")
mat_list
## $snv
##    s1 s2 s3
## g1  1  1  0
## g2  0  1  1
## g3  1  0  1
## 
## $indel
##    s1 s2 s3
## g1  1  0  1
## g2  0  1  0
## g3  0  0  0
oncoPrint() expects all matrix in mat_list having same row names and column names. Users can use unify_mat_list()
to adjust the matrix list.
mat_list$indel = mat_list$indel[1:2, 1:2]
mat_list
## $snv
##    s1 s2 s3
## g1  1  1  0
## g2  0  1  1
## g3  1  0  1
## 
## $indel
##    s1 s2
## g1  1  0
## g2  0  1
mat_list = unify_mat_list(mat_list)
mat_list
## $snv
##    s1 s2 s3
## g1  1  1  0
## g2  0  1  1
## g3  1  0  1
## 
## $indel
##    s1 s2 s3
## g1  1  0  0
## g2  0  1  0
## g3  0  0  0
Same as the first example, but here we also define background in alter_fun_list argument. This function defines
how to add graphics when there is no alteration and it is always put as the first in the list.
oncoPrint(mat_list,
    alter_fun_list = list(
        background = function(x, y, w, h) NULL,
        snv = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "red", col = NA)),
        indel = function(x, y, w, h) grid.rect(x, y, w*0.9, h*0.4, gp = gpar(fill = "blue", col = NA))
    ), col = c(snv = "red", indel = "blue"))
If types of alterations is less than two and the purpose is only to have a quick look at the data, there are default graphics added:
oncoPrint(mat_list)
Now we make an oncoPrint with a real-world data. The data is retrieved from cBioPortal. Steps for getting the data are as follows:
In the results page,
The order of samples can also be downloaded from the results page,
First we read the data and do some pre-processing.
mat = read.table(paste0(system.file("extdata", package = "ComplexHeatmap"), 
    "/tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"), 
    header = TRUE,stringsAsFactors=FALSE, sep = "\t")
mat[is.na(mat)] = ""
rownames(mat) = mat[, 1]
mat = mat[, -1]
mat=  mat[, -ncol(mat)]
mat = t(as.matrix(mat))
mat[1:3, 1:3]
##      TCGA-05-4384-01 TCGA-05-4390-01 TCGA-05-4425-01
## KRAS "  "            "MUT;"          "  "           
## HRAS "  "            "  "            "  "           
## BRAF "  "            "  "            "  "
There are three different alterations in mat: HOMDEL, AMP and MUT. We first 
define how to add graphics which correspond to different alterations. 
alter_fun_list = list(
    background = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "#CCCCCC", col = NA))
    },
    HOMDEL = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "blue", col = NA))
    },
    AMP = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"), gp = gpar(fill = "red", col = NA))
    },
    MUT = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h*0.33, gp = gpar(fill = "#008000", col = NA))
    }
)
Also colors for different alterations which will be used for barplots.
col = c("MUT" = "#008000", "AMP" = "red", "HOMDEL" = "blue")
Make the oncoPrint and adjust heatmap components such as the title and the legend.
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
    alter_fun_list = alter_fun_list, col = col, 
    column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
    heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
        labels = c("Amplification", "Deep deletion", "Mutation")))
As you see, the genes and samples are sorted automatically. Rows are sorted based on the frequency
of the alterations in all samples and columns are sorted to visualize the mutual exclusivity across genes
based on the “memo sort” method which is
kindly provided by B. Arman Aksoy. If you want
to turn off the default sorting, set row_order or column_order to NULL.
By default, if one sample has no alteration, it will still remain in the heatmap, but you can set
remove_empty_columns to TRUE to remove it:
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
    alter_fun_list = alter_fun_list, col = col, 
    remove_empty_columns = TRUE,
    column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
    heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
        labels = c("Amplification", "Deep deletion", "Mutation")))
As the normal Heatmap() function, row_order or column_order can be assigned with a vector of 
orders (either numeric or character). Following the order of samples are gathered from cBio as well.
You can see the difference for the sample order between 'memo sort' and the method used by cBio.
sample_order = scan(paste0(system.file("extdata", package = "ComplexHeatmap"), 
    "/sample_order.txt"), what = "character")
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
    alter_fun_list = alter_fun_list, col = col, 
    row_order = NULL, column_order = sample_order,
    remove_empty_columns = TRUE,
    column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
    heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
        labels = c("Amplification", "Deep deletion", "Mutation")))
oncoPrint() actually returns a HeatmapList object, so you can add more Heatmaps or row annotations
to it to visualize more complicated information.
Following example splits the heatmap into two halves and add a new heatmap to the right.
ht_list = oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]],
    alter_fun_list = alter_fun_list, col = col, 
    remove_empty_columns = TRUE,
    column_title = "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling",
    heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), 
        labels = c("Amplification", "Deep deletion", "Mutation")),
    split = sample(letters[1:2], nrow(mat), replace = TRUE)) +
Heatmap(matrix(rnorm(nrow(mat)*10), ncol = 10), width = unit(4, "cm"))
draw(ht_list, row_sub_title_side = "left")
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods  
## [10] base     
## 
## other attached packages:
##  [1] GetoptLong_0.1.0     dendextend_1.1.0     dendsort_0.3.2       cluster_2.0.3       
##  [5] HilbertCurve_1.0.0   GenomicRanges_1.22.0 GenomeInfoDb_1.6.0   IRanges_2.4.0       
##  [9] S4Vectors_0.8.0      BiocGenerics_0.16.0  circlize_0.3.1       ComplexHeatmap_1.6.0
## [13] knitr_1.11           markdown_0.7.7      
## 
## loaded via a namespace (and not attached):
##  [1] whisker_0.3-2       XVector_0.10.0      magrittr_1.5        zlibbioc_1.16.0    
##  [5] lattice_0.20-33     colorspace_1.2-6    rjson_0.2.15        stringr_1.0.0      
##  [9] tools_3.2.2         png_0.1-7           RColorBrewer_1.1-2  formatR_1.2.1      
## [13] HilbertVis_1.28.0   GlobalOptions_0.0.8 shape_1.4.2         evaluate_0.8       
## [17] mime_0.4            stringi_0.5-5