--- title: "UCSCXenaTools: an R package for Accessing Genomics Data from UCSC Xena platform, from Cancer Multi-omics to Single-cell RNA-seq" author: "Shixiang Wang \\ ShanghaiTech University" date: "`r Sys.Date()`" output: prettydoc::html_pretty: toc: true theme: cayman highlight: github pdf_document: toc: true vignette: > %\VignetteIndexEntry{Basic usage} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` **UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded. **Who is the target audience and what are scientific applications of this package?** * Target Audience: cancer and clinical researchers, bioinformaticians * Applications: genomic and clinical analyses ## Installation Install stable release from CRAN with: ```{r, eval=FALSE} install.packages("UCSCXenaTools") ``` You can also install devel version of **UCSCXenaTools** from github with: ```{r gh-installation, eval = FALSE} # install.packages("remotes") remotes::install_github("ropensci/UCSCXenaTools") ``` If you want to build vignette in local, please add two options: ```{r, eval=FALSE} remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE) ``` The minimum versions to run the vignette is `1.2.4`. [GitHub Issue](https://github.com/ropensci/UCSCXenaTools/issues) is a place for discussing any problem. ## Data Hub List All datasets are available at . Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena. * UCSC Public Hub: * TCGA Hub: * GDC Xena Hub: * ICGC Xena Hub: * Pan-Cancer Atlas Hub: * UCSC Toil RNAseq Recompute Compendium Hub: * PCAWG Xena Hub: * ATAC-seq Hub: * Singel Cell Xena Hub: * Kids First Xena Hub: * Treehouse Xena Hub: Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed by restarting R and `library(UCSCXenaTools)`. If any url of data hub is changed or a new data hub is online, please remind me by emailing to or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues). ## Usage Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`. To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. ### XenaData data.frame **UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs. You can load `XenaData` after loading `UCSCXenaTools` into R. ```{r} library(UCSCXenaTools) data(XenaData) head(XenaData) ``` ### Workflow Select datasets. ```{r} # The options in XenaFilter function support Regular Expression XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% XenaFilter(filterDatasets = "clinical") %>% XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo df_todo ``` Sometimes we only know some keywords, `XenaScan()` can be used to scan all rows to detect if the keywords exist in `XenaData`. ```{r} x1 = XenaScan(pattern = 'Blood') x2 = XenaScan(pattern = 'LUNG', ignore.case = FALSE) x1 %>% XenaGenerate() x2 %>% XenaGenerate() ``` Query and download. ```{r, eval=FALSE} XenaQuery(df_todo) %>% XenaDownload() -> xe_download ``` **For researchers in China, now Hiplot team has deployed several Xena mirror sites (`https://xena.hiplot.com.cn/`) at Shanghai. You can set an option `options(use_hiplot = TRUE)` before querying data step to speed up both data querying and downloading.** ```{r} options(use_hiplot = TRUE) XenaQuery(df_todo) %>% XenaDownload() -> xe_download ``` Prepare data into R for analysis. ```{r} cli = XenaPrepare(xe_download) class(cli) names(cli) ``` ### Browse datasets Create two XenaHub objects: * `to_browse` - a XenaHub object containing a cohort and a dataset. * `to_browse2` - a XenaHub object containing 2 cohorts and 2 datasets. ```{r} XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% XenaFilter(filterDatasets = "clinical") %>% XenaFilter(filterDatasets = "LUAD") -> to_browse to_browse XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% XenaFilter(filterDatasets = "clinical") %>% XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2 to_browse2 ``` `XenaBrowse()` function can be used to browse dataset/cohort links using your default web browser. At default, this function limits one dataset/cohort for preventing user to open too many links at once. ```{r,eval=FALSE} # This will open you web browser XenaBrowse(to_browse) XenaBrowse(to_browse, type = "cohort") ``` ```{r, error=TRUE} # This will throw error XenaBrowse(to_browse2) XenaBrowse(to_browse2, type = "cohort") ``` When you make sure you want to open multiple links, you can set `multiple` option to `TRUE`. ```{r, eval=FALSE} XenaBrowse(to_browse2, multiple = TRUE) XenaBrowse(to_browse2, type = "cohort", multiple = TRUE) ``` ## More usages The core functionality has been described above. I write more usages about this package in my website but not here because sometimes package check will fail due to internet problem. - [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro.pdf) - [APIs of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api.pdf) Read [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/tools/ucscxenatools-single-gene/) to see how to get values for single gene. A use case for survival analysis based on single gene expression has been published on rOpenSci, please read [UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://ropensci.org/technotes/2019/09/06/ucscxenatools-surv/). ## QA ### How to resume file from breakpoint Thanks to the UCSC Xena team, the new feature 'resume from breakpoint' is added and can be done by **XenaDownload()** with the `method` and `extra` flags specified. Of note, the corresponding `wget` or `curl` command must be installed by your OS and can be found by R. The folliwng code gives a test example, the data can be viewed on [web page](https://xenabrowser.net/datapages/?dataset=TcgaTargetGtex_expected_count&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). ```r library(UCSCXenaTools) xe = XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_expected_count") xe xq = XenaQuery(xe) # You cannot resume from breakpoint in default mode XenaDownload(xq, destdir = "~/test/", force = TRUE) # You can do it with 'curl' command XenaDownload(xq, destdir = "~/test/", method = "curl", extra = "-C -", force = TRUE) # You can do it with 'wget' command XenaDownload(xq, destdir = "~/test/", method = "wget", extra = "-c", force = TRUE) ``` ## Citation Cite me by the following paper. ``` Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 # For BibTex @article{Wang2019UCSCXenaTools, journal = {Journal of Open Source Software}, doi = {10.21105/joss.01627}, issn = {2475-9066}, number = {40}, publisher = {The Open Journal}, title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq}, url = {http://dx.doi.org/10.21105/joss.01627}, volume = {4}, author = {Wang, Shixiang and Liu, Xuesong}, pages = {1627}, date = {2019-08-05}, year = {2019}, month = {8}, day = {5}, } ``` Cite UCSC Xena by the following paper. ``` Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data visualization and interpretation." BioRxiv (2019): 326470. ``` ## Acknowledgments This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work.