Contents

1 Prelude

library(tidyverse)
library(ComplexHeatmap)
library(circlize)
library(GGally)

library(pareg)
data(pathway_similarities, package = "pareg")

set.seed(42)

2 Introduction

Pathway similarities describe how similar two pathways are (you’re welcome). For example, when interpreting pathways as gene sets, one could count how many genes are shared between two sets. Many more sophisticated methods, such as the Jaccard index, exist (Gu and Huebschmann 2021).

pareg provides various pre-computed similarity measures (jaccard, overlap_coefficient, semantic) for selected pathway databases (:KEGG, :BP) in matrix form.

mat <- pathway_similarities$`C2@CP:KEGG`$jaccard %>%
  as_dense_sim()
mat[1:3, 1:3]
##          hsa00970    hsa05340    hsa04621
## hsa00970        1 0.000000000 0.000000000
## hsa05340        0 1.000000000 0.008196721
## hsa04621        0 0.008196721 1.000000000
Heatmap(
  mat,
  name = "similarity",
  col = colorRamp2(c(0, 1), c("white", "black")),
  show_row_names = FALSE,
  show_column_names = FALSE
)

3 Comparison of similarity measures

On the Gene Ontology’s Biological Process subcategory, we can observe how much pathway similarity measures can differ from each other.

df_sim <- pathway_similarities$`C5@GO:BP` %>%
  map_dfr(function(mat) {
    if (is.null(mat)) {
      return(NULL)
    }

    mat %>%
      as_dense_sim() %>%
      as.data.frame %>%
      rownames_to_column() %>%
      pivot_longer(-rowname)
  }, .id = "measure") %>%
  filter(value > 0) %>%
  pivot_wider(names_from = measure, values_from = value) %>%
  select(-rowname, -name)

ggpairs(df_sim) +
  theme_minimal()
## Warning: Removed 514552 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 514552 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 586334 rows containing missing values
## Warning: Removed 514552 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 514552 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 586334 rows containing missing values
## Warning: Removed 586334 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 586334 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 71782 rows containing non-finite outside the scale range
## (`stat_density()`).

4 Session information

sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GGally_2.2.1          circlize_0.4.16       pareg_1.9.0          
##  [4] tfprobability_0.15.1  tensorflow_2.16.0     enrichplot_1.25.0    
##  [7] ComplexHeatmap_2.21.0 lubridate_1.9.3       forcats_1.0.0        
## [10] stringr_1.5.1         dplyr_1.1.4           purrr_1.0.2          
## [13] readr_2.1.5           tidyr_1.3.1           tibble_3.2.1         
## [16] tidyverse_2.0.0       ggraph_2.2.1          ggplot2_3.5.1        
## [19] BiocStyle_2.33.0     
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.4.0           later_1.3.2             ggplotify_0.1.2        
##   [4] filelock_1.0.3          polyclip_1.10-6         basilisk.utils_1.17.0  
##   [7] lifecycle_1.0.4         doParallel_1.0.17       globals_0.16.3         
##  [10] lattice_0.22-6          MASS_7.3-60.2           magrittr_2.0.3         
##  [13] sass_0.4.9              rmarkdown_2.26          jquerylib_0.1.4        
##  [16] yaml_2.3.8              remotes_2.5.0           httpuv_1.6.15          
##  [19] doRNG_1.8.6             sessioninfo_1.2.2       pkgbuild_1.4.4         
##  [22] reticulate_1.36.1       cowplot_1.1.3           DBI_1.2.2              
##  [25] RColorBrewer_1.1-3      keras_2.15.0            pkgload_1.3.4          
##  [28] zlibbioc_1.51.0         BiocGenerics_0.51.0     yulab.utils_0.1.4      
##  [31] tweenr_2.0.3            GenomeInfoDbData_1.2.12 IRanges_2.39.0         
##  [34] S4Vectors_0.43.0        ggrepel_0.9.5           listenv_0.9.1          
##  [37] tidytree_0.4.6          parallelly_1.37.1       codetools_0.2-20       
##  [40] DOSE_3.31.0             ggforce_0.4.2           tidyselect_1.2.1       
##  [43] shape_1.4.6.1           aplot_0.2.2             UCSC.utils_1.1.0       
##  [46] farver_2.1.1            viridis_0.6.5           doFuture_1.0.1         
##  [49] matrixStats_1.3.0       stats4_4.4.0            base64enc_0.1-3        
##  [52] jsonlite_1.8.8          GetoptLong_1.0.5        ellipsis_0.3.2         
##  [55] tidygraph_1.3.1         iterators_1.0.14        foreach_1.5.2          
##  [58] ggnewscale_0.4.10       progress_1.2.3          tools_4.4.0            
##  [61] treeio_1.29.0           Rcpp_1.0.12             glue_1.7.0             
##  [64] gridExtra_2.3           tfruns_1.5.3            xfun_0.43              
##  [67] qvalue_2.37.0           usethis_2.2.3           GenomeInfoDb_1.41.0    
##  [70] withr_3.0.0             BiocManager_1.30.22     fastmap_1.1.1          
##  [73] basilisk_1.17.0         fansi_1.0.6             digest_0.6.35          
##  [76] timechange_0.3.0        R6_2.5.1                mime_0.12              
##  [79] gridGraphics_0.5-1      colorspace_2.1-0        Cairo_1.6-2            
##  [82] GO.db_3.19.1            RSQLite_2.3.6           utf8_1.2.4             
##  [85] generics_0.1.3          data.table_1.15.4       prettyunits_1.2.0      
##  [88] graphlayouts_1.1.1      httr_1.4.7              htmlwidgets_1.6.4      
##  [91] scatterpie_0.2.2        ggstats_0.6.0           whisker_0.4.1          
##  [94] pkgconfig_2.0.3         gtable_0.3.5            blob_1.2.4             
##  [97] XVector_0.45.0          shadowtext_0.1.3        htmltools_0.5.8.1      
## [100] profvis_0.3.8           bookdown_0.39           fgsea_1.31.0           
## [103] clue_0.3-65             scales_1.3.0            Biobase_2.65.0         
## [106] png_0.1-8               ggfun_0.1.4             knitr_1.46             
## [109] tzdb_0.4.0              reshape2_1.4.4          rjson_0.2.21           
## [112] nloptr_2.0.3            nlme_3.1-164            proxy_0.4-27           
## [115] cachem_1.0.8            GlobalOptions_0.1.2     parallel_4.4.0         
## [118] miniUI_0.1.1.1          HDO.db_0.99.1           AnnotationDbi_1.67.0   
## [121] logger_0.3.0            pillar_1.9.0            vctrs_0.6.5            
## [124] urlchecker_1.0.1        promises_1.3.0          xtable_1.8-4           
## [127] cluster_2.1.6           evaluate_0.23           magick_2.8.3           
## [130] tinytex_0.50            zeallot_0.1.0           cli_3.6.2              
## [133] compiler_4.4.0          rngtools_1.5.2          rlang_1.1.3            
## [136] crayon_1.5.2            future.apply_1.11.2     labeling_0.4.3         
## [139] plyr_1.8.9              fs_1.6.4                stringi_1.8.3          
## [142] viridisLite_0.4.2       BiocParallel_1.39.0     munsell_0.5.1          
## [145] Biostrings_2.73.0       lazyeval_0.2.2          devtools_2.4.5         
## [148] GOSemSim_2.31.0         Matrix_1.7-0            dir.expiry_1.13.0      
## [151] hms_1.1.3               patchwork_1.2.0         bit64_4.0.5            
## [154] future_1.33.2           KEGGREST_1.45.0         shiny_1.8.1.1          
## [157] highr_0.10              igraph_2.0.3            memoise_2.0.1          
## [160] bslib_0.7.0             ggtree_3.13.0           fastmatch_1.1-4        
## [163] bit_4.0.5               ape_5.8

References

Gu, Zuguang, and Daniel Huebschmann. 2021. “SimplifyEnrichment: An R/Bioconductor Package for Clustering and Visualizing Functional Enrichment Results.” bioRxiv.