--- title: "Exporatory data analysis by querying the ToppGene Suite" shorttitle: "toppgene" author: - name: Pariksheet Nanda affiliation: University of Pittsburgh email: pan79@pitt.edu package: toppgene abstract: > The ToppGene Suite is a one-stop portal for gene list enrichment analysis and candidate gene prioritization based on functional annotations and protein interactions network. Although the ToppCluster web application provides convenient graphical access to the ToppGene Suite, the OpenAPI 3.0 compliant interface of ToppGene is better suited for automation and reproducibility. This package was initial generated from OpenAPI Generator and supplemented with Bioconductor class interfaces and more relevant biological examples. bibliography: references.bib output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{toppgene} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knitr-opts} #| include = FALSE knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Overview The `r BiocStyle::Biocpkg("toppgene")` package is a client for the ToppGene Suite webserver that takes as input a gene list to perform enrichment analysis. To demonstrate the use of ToppGene, below are the two test cases from the publication [@chen_improved_2007] of congenital heart disease (CHD) and diabetic retinopathy (DR). # Installation To install this package, start R and enter: ```{r install-from-bioc, eval = FALSE} if (! require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("toppgene") ``` # Usage ## Prepare the gene lists A query may contain one or more genes. ToppGene `enrich()` requires gene Entrez ID integers. However, symbol conversion with ToppGene is more permissive than Bioconductor, therefore use ToppGene's `lookup()` function to convert gene symbols to Entrez IDs. The published example provides gene symbols for CHD (n = 28) and DR (n = 27) that we will also use here. ```{r setup} genes_chd_sym <- c( "ADD1", "CITED2", "DTNA", "CKM", "GATA4", "GJA1", "HAND1", "HAND2", "HEY2", "HOXC4", "HOXC5", "ITGB3", "JARID2", "MTHFD1", "MTHFR", "MTRR", "NKX2-5", "NOS3", "NPPA", "NPPB", "RFC1", "SALL4", "TBX1", "TBX5", "TBX20", "TGFB1", "ZFPM2", "ZIC3") genes_dr_sym <- c( "ACE", "ADRB3", "AGT", "AGTR2", "AKR1B1", "APOE", "AR", "CMA1", "EDN1", "GNB3", "HFE", "HLA-DPB1", "HLA-DRB1", "ICAM1", "ITGA2B", "ITGB2", "LTA", "NOS2A", "NOS3", "NPY", "PECAM1", "PON1", "RAGE", "SELE", "SERPINE1", "TIMP3", "TNF") ``` ## Convert gene symbol IDs to Entrez IDs ```{r toppgene-lookup} library(toppgene) genes_chd <- lookup(genes_chd_sym) genes_chd genes_dr <- lookup(genes_dr_sym) genes_dr ``` ## Run enrichment queries ```{r toppgene-enrich} enrich_chd <- enrich(genes_chd$Entrez) enrich_chd enrich_dr <- enrich(genes_dr$Entrez) enrich_dr ``` ## View enrichment of publication top-ranked gene ```{r toppgene-compare} library(IRanges) # CharacterList library(DFplyr) # (DataFrame support for various dplyr functions) ## Show all DataFrame rows of top_results(). orig <- options(showHeadLines = 20L) top_results <- function(df) { df |> group_by(Category) |> slice(1) |> ungroup() |> ## Shorten GeneOntology to GO. mutate(Category = gsub(x = Category, "GeneOntology", "GO")) |> select(Category, ID, Name, GenesSymbol) } enrich_chd |> filter(any(GenesSymbol %in% CharacterList("HAND2"))) |> top_results() enrich_dr |> filter(any(GenesSymbol %in% CharacterList("HLA-DPB1"))) |> top_results() options(showHeadLines = orig) ``` ## Convert drug database identifiers to PubChem CIDs ```{r toppgene-pubchem} enrich_chd |> lookup_pubchem() enrich_dr |> lookup_pubchem() ``` ## Change default limits of enrichment queries One can change the various cut-offs of a query using the `CategoriesDataFrame()` to limit or expand the number of results. ```{r toppgene-modify-defaults} ## Default cut-offs. cats <- CategoriesDataFrame() cats ## Limit to 10 results for each category, and lower PValue for GeneOntology. cats <- cats |> mutate( PValue = case_when( grepl("GeneOntology", rownames(cats)) ~ 1e-7, .default = PValue), MaxResults = 10L) cats enrich_chd_mod <- enrich( genes_chd$Entrez, cats) enrich_chd_mod ## MaxResults limited to at most 10. enrich_chd_mod |> count(Category) ## PValue limited to below 1e-7. enrich_chd_mod |> arrange(desc(PValue)) |> filter(grepl(x = Category, "Onto")) |> group_by(Category) |> slice(1) ``` # Session Info {.unnumbered} ```{r session-info} sessionInfo() ``` # References {.unnumbered}