This vignette illustrates use cases and visualizations of the data found in the depmap package. See the depmap vignette for details about the datasets.
The depmap
package aims to provide a reproducible research framework to cancer dependency
data described by Tsherniak, Aviad, et al. “Defining a cancer dependency map.”
Cell 170.3 (2017): 564-576.. The
data found in the depmap
package has been formatted to facilitate the use of common R packages such as
dplyr and ggplot2. We hope that this package will allow researchers to more
easily mine, explore and visually illustrate dependency data taken from the
Depmap cancer genomic dependency study.
Perhaps the most interesting datasets found within the
depmap
package are those that relate to the cancer gene dependency score, such as
rnai and crispr. These datasets contain a score expressing how vital a
particular gene is in terms of how lethal the knockout/knockdown of that gene is
on a target cell line. For example, a highly negative dependency score implies
that a cell line is highly dependent on that gene.
Load necessary libaries.
library("dplyr")
library("ggplot2")
library("viridis")
library("tibble")
library("gridExtra")
library("stringr")
library("depmap")
library("ExperimentHub")Load the rnai, crispr and copyNumber datasets for visualization. Note: the
datasets listed below are from the 19Q3 release. Newer datasets, such as those
from the 20Q1 release are available.
## create ExperimentHub query object
eh <- ExperimentHub()
query(eh, "depmap")## ExperimentHub with 43 records
## # snapshotDate(): 2020-10-02
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH2260"]]' 
## 
##            title             
##   EH2260 | rnai_19Q1         
##   EH2261 | crispr_19Q1       
##   EH2262 | copyNumber_19Q1   
##   EH2263 | RPPA_19Q1         
##   EH2264 | TPM_19Q1          
##   ...      ...               
##   EH3797 | crispr_20Q3       
##   EH3798 | copyNumber_20Q3   
##   EH3799 | TPM_20Q3          
##   EH3800 | mutationCalls_20Q3
##   EH3801 | metadata_20Q3rnai <- eh[["EH3080"]]
mutationCalls <- eh[["EH3085"]]
metadata <- eh[["EH3086"]]
TPM <- eh[["EH3084"]]
copyNumber <- eh[["EH3082"]]
# crispr <- eh[["EH3081"]]
# drug_sensitivity <- eh[["EH3087"]]By importing the depmap data into the R environment, the data can be mined
more effectively. For example, if one interested researching soft tissue
sarcomas and wanted to search all such cancer cell lines for the gene with the
greatest dependency, it is possible to accomplish this task by using data
manipulation and visualization tools dplyr and ggplot2. Below, the rnai
dataset is selected for cell lines with “SOFT_TISSUE” in the CCLE name, and
displaying a list of the highest dependency scores.
## list of dependency scores
rnai %>% dplyr::select(cell_line, gene_name, dependency) %>%
         dplyr::filter(stringr::str_detect(cell_line, "SOFT_TISSUE")) %>%
         dplyr::arrange(dependency) %>% 
         head(10)## # A tibble: 10 x 3
##    cell_line          gene_name dependency
##    <chr>              <chr>          <dbl>
##  1 FUJI_SOFT_TISSUE   RPL14          -3.60
##  2 SJRH30_SOFT_TISSUE RAN            -3.41
##  3 SJRH30_SOFT_TISSUE RPL14          -3.36
##  4 SJRH30_SOFT_TISSUE RBX1           -3.31
##  5 HS729_SOFT_TISSUE  PSMA3          -3.22
##  6 SJRH30_SOFT_TISSUE RUVBL2         -3.13
##  7 KYM1_SOFT_TISSUE   RPL14          -3.03
##  8 RH41_SOFT_TISSUE   RBX1           -3.01
##  9 HS729_SOFT_TISSUE  NUTF2          -2.90
## 10 SJRH30_SOFT_TISSUE NUTF2          -2.85As the gene RPL14 appears several times in the top dependencies scores, it may
make an interesting candidate target. Below, a plot of the rnai data is
displayed as a histogram showing the distribution of dependency scores for gene
RPL14.
## Basic histogram
rnai %>% dplyr::select(gene, gene_name, dependency) %>% 
         dplyr::filter(gene_name == "RPL14") %>% 
         ggplot(aes(x = dependency)) +
         geom_histogram() +
         geom_vline(xintercept = mean(rnai$dependency, na.rm = TRUE),
                    linetype = "dotted", color = "red") +
         ggtitle("Histogram of dependency scores for gene RPL14")A more complex plot of the rnai data, as shown below involves plotting the
distribution of dependency scores for gene RPL14 for each major type of
cancer, while highlighting the nature of mutations of this gene in such cancer
cell lines (e.g. if such are COSMIC hotspots, damaging, etc.). Notice that the
plot above reflects the same overall distribution in two dimensions.
meta_rnai <- metadata %>%
             dplyr::select(depmap_id, lineage) %>%
             dplyr::full_join(rnai, by = "depmap_id") %>%
             dplyr::filter(gene_name == "RPL14") %>% 
             dplyr::full_join((mutationCalls %>%
                              dplyr::select(depmap_id, entrez_id,
                                            is_cosmic_hotspot, var_annotation)),
                                            by = c("depmap_id", "entrez_id"))
p1 <- meta_rnai %>%
      ggplot(aes(x = dependency, y = lineage)) +
      geom_point(alpha = 0.4, size = 0.5) +
      geom_point(data = subset(
         meta_rnai, var_annotation == "damaging"), color = "red") +
      geom_point(data = subset(
         meta_rnai, var_annotation == "other non-conserving"), color = "blue") +
      geom_point(data = subset(
         meta_rnai, var_annotation == "other conserving"), color = "cyan") +
      geom_point(data = subset(
         meta_rnai, is_cosmic_hotspot == TRUE), color = "orange") +
      geom_vline(xintercept=mean(meta_rnai$dependency, na.rm = TRUE),
                 linetype = "dotted", color = "red") +
      ggtitle("Scatterplot of dependency scores for gene RPL14 by lineage")
p1Below is a boxplot displaying expression values for gene RPL14 by lineage:
metadata %>%
      dplyr::select(depmap_id, lineage) %>%
      dplyr::full_join(TPM, by = "depmap_id") %>%
      dplyr::filter(gene_name == "RPL14") %>% 
      ggplot(aes(x = lineage, y = expression, fill = lineage)) +
      geom_boxplot(outlier.alpha = 0.1) +
      ggtitle("Boxplot of expression values for gene RPL14 by lineage") +
      theme(axis.text.x = element_text(angle = 45, hjust=1)) +
      theme(legend.position = "none")High dependency, high expression genes are more likely to interesting research targets. Below is a plot of expression vs rnai gene dependency for Rhabdomyosarcoma Sarcoma:
## expression vs rnai gene dependency for Rhabdomyosarcoma Sarcoma
sarcoma <- metadata %>%
           dplyr::select(depmap_id, cell_line,
                         primary_disease, subtype_disease) %>%
           dplyr::filter(primary_disease == "Sarcoma",
                         subtype_disease == "Rhabdomyosarcoma")
rnai_sub <- rnai %>% dplyr::select(depmap_id, gene, gene_name, dependency)
tpm_sub <- TPM %>% dplyr::select(depmap_id, gene, gene_name, expression)
sarcoma_dep <- sarcoma %>%
               dplyr::left_join(rnai_sub, by = "depmap_id") %>%
               dplyr::select(-cell_line, -primary_disease,
                             -subtype_disease, -gene_name)
sarcoma_exp <- sarcoma %>% dplyr::left_join(tpm_sub, by = "depmap_id")
sarcoma_dat_exp <- dplyr::full_join(sarcoma_dep, sarcoma_exp,
                             by = c("depmap_id", "gene")) %>%
                             dplyr::filter(!is.na(expression))
p2 <- ggplot(data = sarcoma_dat_exp, aes(x = dependency, y = expression)) +
      geom_point(alpha = 0.4, size = 0.5) +
      geom_vline(xintercept=mean(sarcoma_dat_exp$dependency, na.rm = TRUE),
                 linetype = "dotted", color = "red") +
      geom_hline(yintercept=mean(sarcoma_dat_exp$expression, na.rm = TRUE),
                 linetype = "dotted", color = "red") +
      ggtitle("Scatterplot of rnai dependency vs expression values for gene")
p2 + theme(axis.text.x = element_text(angle = 45))A selection of the genes shown above with the lowest depenency scores, also displaying gene expression in TPM in the last column.
sarcoma_dat_exp %>%
    dplyr::select(cell_line, gene_name, dependency, expression) %>%
    dplyr::arrange(dependency) %>% 
    head(10)## # A tibble: 10 x 4
##    cell_line        gene_name dependency expression
##    <chr>            <chr>          <dbl>      <dbl>
##  1 A204_SOFT_TISSUE RPS27A         -2.62      10.6 
##  2 A204_SOFT_TISSUE RPL14          -2.34      10.0 
##  3 A204_SOFT_TISSUE RPL7           -2.23      11.5 
##  4 A204_SOFT_TISSUE RPS16          -2.08      11.2 
##  5 A204_SOFT_TISSUE RPS15A         -1.92      11.6 
##  6 A204_SOFT_TISSUE RBX1           -1.91       6.51
##  7 A204_SOFT_TISSUE SF3B2          -1.80       7.47
##  8 A204_SOFT_TISSUE RPL5           -1.79      10.7 
##  9 A204_SOFT_TISSUE RPS3A          -1.77      11.4 
## 10 A204_SOFT_TISSUE RPL13          -1.68      11.6Below is a boxplot displaying log genomic copy number for gene RPL14 by
lineage:
metadata %>%
    dplyr::select(depmap_id, lineage) %>%
    dplyr::full_join(copyNumber, by = "depmap_id") %>%
    dplyr::filter(gene_name == "RPL14") %>%
    ggplot(aes(x = lineage, y = log_copy_number, fill = lineage)) +
    geom_boxplot(outlier.alpha = 0.1) +
    ggtitle("Boxplot of log copy number for gene RPL14 by lineage") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    theme(legend.position = "none")## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] stringr_1.4.0        gridExtra_2.3        tibble_3.0.4        
##  [4] viridis_0.5.1        viridisLite_0.3.0    ggplot2_3.3.2       
##  [7] ExperimentHub_1.16.0 AnnotationHub_2.22.0 BiocFileCache_1.14.0
## [10] dbplyr_1.4.4         BiocGenerics_0.36.0  depmap_1.4.0        
## [13] dplyr_1.0.2          BiocStyle_2.18.0    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5                    assertthat_0.2.1             
##  [3] digest_0.6.27                 utf8_1.1.4                   
##  [5] mime_0.9                      R6_2.5.0                     
##  [7] stats4_4.0.3                  RSQLite_2.2.1                
##  [9] evaluate_0.14                 httr_1.4.2                   
## [11] pillar_1.4.6                  rlang_0.4.8                  
## [13] curl_4.3                      blob_1.2.1                   
## [15] magick_2.5.0                  S4Vectors_0.28.0             
## [17] rmarkdown_2.5                 labeling_0.4.2               
## [19] bit_4.0.4                     munsell_0.5.0                
## [21] shiny_1.5.0                   compiler_4.0.3               
## [23] httpuv_1.5.4                  xfun_0.18                    
## [25] pkgconfig_2.0.3               htmltools_0.5.0              
## [27] tidyselect_1.1.0              interactiveDisplayBase_1.28.0
## [29] bookdown_0.21                 IRanges_2.24.0               
## [31] fansi_0.4.1                   crayon_1.3.4                 
## [33] withr_2.3.0                   later_1.1.0.1                
## [35] rappdirs_0.3.1                grid_4.0.3                   
## [37] xtable_1.8-4                  gtable_0.3.0                 
## [39] lifecycle_0.2.0               DBI_1.1.0                    
## [41] magrittr_1.5                  scales_1.1.1                 
## [43] cli_2.1.0                     stringi_1.5.3                
## [45] farver_2.0.3                  promises_1.1.1               
## [47] ellipsis_0.3.1                generics_0.0.2               
## [49] vctrs_0.3.4                   tools_4.0.3                  
## [51] bit64_4.0.5                   Biobase_2.50.0               
## [53] glue_1.4.2                    purrr_0.3.4                  
## [55] BiocVersion_3.12.0            fastmap_1.0.1                
## [57] yaml_2.2.1                    AnnotationDbi_1.52.0         
## [59] colorspace_1.4-1              BiocManager_1.30.10          
## [61] memoise_1.1.0                 knitr_1.30