--- title: bedbaser author: - name: A Wokaty affiliation: CUNY School of Public Health email: jennifer.wokaty@sph.cuny.edu package: bedbaser output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{bedbaser} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction ```{r library, include=FALSE} library(rtracklayer) library(R.utils) library(liftOver) ``` `r Biocpkg("bedbaser")` is an R API client for [BEDbase](https://bedbase.org) that provides access to the [bedhost API](https://api.bedbase.org) and includes convenience functions, such as to create GRanges and GRangesList objects. # Install bedbaser and create a BEDbase instance Install `r Biocpkg("bedbaser")` using `r CRANpkg("BiocManager")`. ```{r install, eval=FALSE} if (!"BiocManager" %in% rownames(installed.packages())) { install.packages("BiocManager") } BiocManager::install("bedbaser") ``` Load the package and create a BEDbase instance, optionally setting the cache to `cache_path`. If `cache_path` is not set, `r Biocpkg("bedbaser")` will choose the default location. ```{r bedbase} library(bedbaser) bedbase <- BEDbase(tempdir()) ``` `r Biocpkg("bedbaser")` can use the same cache as [geniml](https://docs.bedbase.org/geniml/)'s BBClient by setting the `cache_path` to the same location. It will create the following structure: ``` cache_path bedfiles a/f/afile.bed.gz bedsets a/s/aset.txt ``` # Convenience Functions `r Biocpkg("bedbaser")` includes convenience functions prefixed with *bb_* to facilitate finding BED files, exploring their metadata, downloading files, and creating `GRanges` objects. ## Find a BED file or BEDset Use `bb_list_beds()` and `bb_list_bedsets()` to browse available resources in BEDbase. Both functions display the id and names of BED files and BEDsets. An id can be used to access a specific resource. ```{r bb_list_beds} bb_list_beds(bedbase) bb_list_bedsets(bedbase) ``` ## Examine metadata Use `bb_metadata()` to learn more about a BED or BEDset associated with an id. ```{r bb_metadata} ex_bed <- bb_example(bedbase, "bed") md <- bb_metadata(bedbase, ex_bed$id) head(md) ``` ## Show BED files in BEDset Use `bb_beds_in_bedset()` to display the id of BEDs in a BEDset. ```{r bb_beds_in_bedset} bb_beds_in_bedset(bedbase, "excluderanges") ``` ## Search for a BED file by keyword Search for BED files by keywords. `bb_bed_text_search()` returns all BED files scored against a keyword query. ```{r bb_search} bb_bed_text_search(bedbase, "cancer", limit = 10) ``` ## Import a BED into a GRanges object Create a GRanges object with a BED id with `bb_to_granges`, which downloads and imports a BED file using `r Biocpkg("rtracklayer")`. ```{r bb_to_granges} ex_bed <- bb_example(bedbase, "bed") # Allow bedbaser to assign column names and types bb_to_granges(bedbase, ex_bed$id, quietly = FALSE) ``` For BEDX+Y formats, a named list with column types may be passed through `extra_cols` if the column name and type are known. Otherwise, `bb_to_granges` guesses the column types and assigns column names. ```{r bb_to_granges_manual, eval=FALSE} # Manually assign column name and type using `extra_cols` bb_to_granges(bedbase, ex_bed$id, extra_cols = c("column_name" = "character")) ``` `bb_to_granges` automatically assigns the column names and types for broad peak and narrow peak files. ```{r bb_to_granges_narrowpeak, message=FALSE} bed_id <- "bbad85f21962bb8d972444f7f9a3a932" md <- bb_metadata(bedbase, bed_id) head(md) bb_to_granges(bedbase, bed_id) ``` `bb_to_granges` can also import big BED files. ```{r bb_to_granges_big_bed} bed_id <- "ffc1e5ac45d923135500bdd825177356" bb_to_granges(bedbase, bed_id, "bigbed", quietly = FALSE) ``` ## Import a BEDset into a GRangesList Create a GRangesList given a BEDset id with `bb_to_grangeslist`. ```{r bb_to_grangeslist, message=FALSE} bedset_id <- "lola_hg38_ucsc_features" bb_to_grangeslist(bedbase, bedset_id) ``` ## Save a BED file Save BED files or BEDsets with `bb_save`: ```{r bb_save} bb_save(bedbase, ex_bed$id, tempdir()) ``` # Accessing BEDbase API endpoints Because `r Biocpkg("bedbaser")` uses the `r Biocpkg("AnVIL")` Service class, it's possible to access any endpoint of the [BEDbase API](https://api.bedbase.org/v1/docs). ```{r operations} show(bedbase) ``` For example, to access a BED file's stats, access the endpoint with `$` and use `r CRANpkg("httr")` to get the result. `show` will display information about the endpoint. ```{r service_class_example} library(httr) show(bedbase$get_bed_stats_v1_bed__bed_id__metadata_stats_get) id <- "bbad85f21962bb8d972444f7f9a3a932" rsp <- bedbase$get_bed_stats_v1_bed__bed_id__metadata_stats_get(id) content(rsp) ``` # Example: Change genomic coordinate system with liftOver Given a BED id, we can use `r Biocpkg("liftOver")` to convert one genomic coordinate system to another. Install `r Biocpkg("liftOver")` and `r Biocpkg("rtracklayer")` then load the packages. ```{r liftOver, eval=FALSE} if (!"BiocManager" %in% rownames(installed.packages())) { install.packages("BiocManager") } BiocManager::install(c("liftOver", "rtracklayer")) library(liftOver) library(rtracklayer) ``` Create a GRanges object from a [mouse genome](https://bedbase.org/bed/7816f807ffe1022f438e1f5b094acf1a). Create a BEDbase Service instance. Use the instance to create a GRanges object from the BEDbase `id`. ```{r convertCoordinates_createGRangesObject, message=FALSE} id <- "7816f807ffe1022f438e1f5b094acf1a" bedbase <- BEDbase() gro <- bb_to_granges(bedbase, id) gro ``` Download the [chain file](https://hgdownload.cse.ucsc.edu/downloads.html#liftover) from UCSC. ```{r convertCoordinates_getTheChain, message=FALSE} chain_url <- paste0( "https://hgdownload.cse.ucsc.edu/goldenPath/mm10/liftOver/", "mm10ToMm39.over.chain.gz" ) tmpdir <- tempdir() gz <- file.path(tmpdir, "mm10ToMm39.over.chain.gz") download.file(chain_url, gz) gunzip(gz, remove = FALSE) ``` Import the chain, set the sequence levels style, and set the genome for the GRanges object. ```{r convertCoordinates_convert, message=FALSE} ch <- import.chain(file.path(tmpdir, "mm10ToMm39.over.chain")) seqlevelsStyle(gro) <- "UCSC" gro39 <- liftOver(gro, ch) gro39 <- unlist(gro39) genome(gro39) <- "mm39" gro39 ``` # SessionInfo() ```{r sessionInfo} sessionInfo() ```