How to query the Ontology Lookup Service directly from R and how to create and parse controlled vocabulary.
rols 2.30.2
rols is a Bioconductor package and should hence be installed using the dedicated functionality
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rols")To get help, either post your question on the Bioconductor support site or open an issue on the rols github page.
The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.
The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.
There are 251 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.
The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.
library("rols")The Ontology and Ontologies classes can store information about
single of multiple ontologies. The latter can be easily subset using
[ and [[, as one would for lists.
ol <- Ontologies()## ⠙ Iterating 12 done (5.7/s) | 2.1s## ⠙ Iterating 13 done (5.7/s) | 2.3sol## Object of class 'Ontologies' with 251 entries
##    ADO, AGRO ... CCF, CPONThead(olsNamespace(ol))## [1] "ado"       "agro"      "aism"      "amphx"     "apo"       "apollo_sv"ol[["bspo"]]## Ontology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-02-15 Updated: 2024-02-15 Version: 2023-05-27 
##    169 terms  236 properties  18 individualsIt is also possible to initialise a single ontology
bspo <- Ontology("bspo")
bspo## Ontology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-02-15 Updated: 2024-02-15 Version: 2023-05-27 
##    169 terms  236 properties  18 individualsSingle ontology terms are stored in Term objects. When more terms
need to be manipulated, they are stored as Terms objects. It is easy
to obtain all terms of an ontology of interest, and the resulting
Terms object can be subset using [ and [[, as one would for
lists.
bspotrms <- Terms(bspo) ## or Terms("bspo")
bspotrms## Object of class 'Terms' with 169 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... IAO:0000409, PATO:0000001bspotrms[1:10]## Object of class 'Terms' with 10 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... BFO:0000023, BFO:0000031bspotrms[["BSPO:0000092"]]## A Term from the BSPO ontology: BSPO:0000092 
##  Label: anatomical compartment boundary
##   to be merged into CAROIt is also possible to initialise a single term
trm <- Term(bspo, "BSPO:0000092")
termId(trm)## [1] "BSPO:0000092"termLabel(trm)## [1] "anatomical compartment boundary"It is then possible to extract the ancestors, descendants,
parents and children terms. Each of these functions return a
Terms object
parents(trm)## Object of class 'Terms' with 1 entries
##  From the BSPO ontology
## CARO:0000010children(trm)## Object of class 'Terms' with 6 entries
##  From the BSPO ontology
##   BSPO:0000094, BSPO:0000093 ... BSPO:0000041, BSPO:0000040Finally, a single term or terms object can be coerced to a
data.frame using as(x, "data.frame").
Properties (relationships) of single or multiple terms or complete
ontologies can be queries with the properties method, as briefly
illustrated below.
trm <- Term("uberon", "UBERON:0002107")
trm## A Term from the UBERON ontology: UBERON:0002107 
##  Label: liver
##   An exocrine gland which secretes bile and functions in metabolism of
##   protein and carbohydrate and fat, synthesizes substances involved in
##   the clotting of the blood, synthesizes vitamin A, detoxifies poisonous
##   substances, stores glycogen, and breaks down worn-out erythrocytes[GO].p <- Properties(trm)
p## Object of class 'Properties' with 158 entries
##  From the UBERON ontology
##   hepatobiliary system, exocrine system ... liver serosa, liver subserosap[[1]]## A Property from the UBERON ontology: UBERON:0002423 
##  Label: hepatobiliary systemtermLabel(p[[1]])## [1] "hepatobiliary system"A researcher might be interested in the trans-Golgi network. Searching
the OLS is assured by the OlsSearch and olsSearch
classes/functions. The first step is to defined the search query with
OlsSearch, as shown below. This creates an search object of class
OlsSearch that stores the query and its parameters. In records the
number of requested results (default is 20) and the total number of
possible results (there are 16856 results across all
ontologies, in this case). At this stage, the results have not yet
been downloaded, as shown by the 0 responses.
OlsSearch(q = "trans-golgi network")## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 16856)
##   response(s): 016856 results are probably too many to be
relevant. Below we show how to perform an exact search by setting
exact = TRUE, and limiting the search the the GO ontology by
specifying ontology = "GO", or doing both.
OlsSearch(q = "trans-golgi network", exact = TRUE)## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 215)
##   response(s): 0OlsSearch(q = "trans-golgi network", ontology = "GO")## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 1109)
##   response(s): 0OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 25)
##   response(s): 0One case set the rows argument to set the number of desired results.
OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 200 (out of 1109)
##   response(s): 0See ?OlsSearch for details about retrieving many results.
Let’s proceed with the exact search and retrieve the results. Even if
we request the default 20 results, only the 215 relevant
result will be retrieved. The olsSearch function updates the
previously created object (called qry below) by adding the results
to it.
qry <- OlsSearch(q = "trans-golgi network", exact = TRUE)
(qry <- olsSearch(qry))## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 215)
##   response(s): 20We can now transform this search result object into a fully fledged
Terms object or a data.frame.
(qtrms <- as(qry, "Terms"))## Object of class 'Terms' with 20 entries
##  From the NCIT, PR, GO, ZP, PW ontologies
##   NCIT:C33802, PR:O43493 ... GO:0042147, PW:0000426str(qdrf <- as(qry, "data.frame"))## 'data.frame':    20 obs. of  8 variables:
##  $ iri            : chr  "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/PR_O43493" "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/GO_0032588" ...
##  $ ontology_name  : chr  "ncit" "pr" "go" "go" ...
##  $ ontology_prefix: chr  "NCIT" "PR" "GO" "GO" ...
##  $ short_form     : chr  "NCIT_C33802" "PR_O43493" "GO_0005802" "GO_0032588" ...
##  $ description    :List of 20
##   ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
##   ..$ : chr  "A trans-Golgi network integral membrane protein 2 that is encoded in the genome of human." "Category=organism-gene."
##   ..$ : chr  "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We "| __truncated__
##   ..$ : chr "The lipid bilayer surrounding any of the compartments that make up the trans-Golgi network."
##   ..$ : chr "Abnormal(ly) mislocalised (of) enterocyte of trans-Golgi network."
##   ..$ : chr "A vesicle that mediates transport between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the lumen."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the cytoplasm."
##   ..$ : chr "The lipid bilayer surrounding a vesicle transporting substances between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The volume enclosed within the membrane of a trans-Golgi network transport vesicle."
##   ..$ : chr "A clathrin coat found on a vesicle of the trans-Golgi network."
##   ..$ : chr  "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of rat." "Category=organism-gene."
##   ..$ : chr "A process which results in the assembly, arrangement of constituent parts, or disassembly of a trans-Golgi network membrane."
##   ..$ : chr  "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of mouse." "Category=organism-gene."
##   ..$ : chr "The directed movement of substances from the plasma membrane back to the trans-Golgi network, mediated by vesicles."
##   ..$ : chr "The directed movement of substances, in membrane-bounded vesicles, from the trans-Golgi network to the recycling endosomes."
##   ..$ : chr "The directed movement of proteins from the Golgi to the plasma membrane in transport vesicles that move from th"| __truncated__
##   ..$ : chr "The directed movement of substances from the vacuole to the trans-Golgi network; this occurs in yeast via the p"| __truncated__
##   ..$ : chr "The directed movement of membrane-bounded vesicles from endosomes back to the trans-Golgi network where they ar"| __truncated__
##   ..$ : chr "In the secretory pathway, protein sorting, mainly in trans-Golgi Network (TGN), but also in other compartments,"| __truncated__
##  $ label          : chr  "Trans-Golgi Network" "trans-Golgi network integral membrane protein 2 (human)" "trans-Golgi network" "trans-Golgi network membrane" ...
##  $ obo_id         : chr  "NCIT:C33802" "PR:O43493" "GO:0005802" "GO:0032588" ...
##  $ type           : chr  "class" "class" "class" "class" ...In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the ncit, pr, go, go, zp, go, go, go, go, go, go, pr, go, pr, go, go, go, go, go, pw ontology
qtrms <- unique(qtrms)
termOntology(qtrms)## NCIT:C33802   PR:O43493  GO:0005802  GO:0032588  ZP:0142408  GO:0030140 
##      "ncit"        "pr"        "go"        "go"        "zp"        "go" 
##  GO:0098540  GO:0098541  GO:0012510  GO:0098564  GO:0030130   PR:P19814 
##        "go"        "go"        "go"        "go"        "go"        "pr" 
##  GO:0098629   PR:Q62313  GO:0035526  GO:0044795  GO:0043001  GO:0045018 
##        "go"        "pr"        "go"        "go"        "go"        "go" 
##  GO:0042147  PW:0000426 
##        "go"        "pw"termNamespace(qtrms)## $`NCIT:C33802`
## NULL
## 
## $`PR:O43493`
## [1] "protein"
## 
## $`GO:0005802`
## [1] "cellular_component"
## 
## $`GO:0032588`
## [1] "cellular_component"
## 
## $`ZP:0142408`
## NULL
## 
## $`GO:0030140`
## [1] "cellular_component"
## 
## $`GO:0098540`
## [1] "cellular_component"
## 
## $`GO:0098541`
## [1] "cellular_component"
## 
## $`GO:0012510`
## [1] "cellular_component"
## 
## $`GO:0098564`
## [1] "cellular_component"
## 
## $`GO:0030130`
## [1] "cellular_component"
## 
## $`PR:P19814`
## [1] "protein"
## 
## $`GO:0098629`
## [1] "biological_process"
## 
## $`PR:Q62313`
## [1] "protein"
## 
## $`GO:0035526`
## [1] "biological_process"
## 
## $`GO:0044795`
## [1] "biological_process"
## 
## $`GO:0043001`
## [1] "biological_process"
## 
## $`GO:0045018`
## [1] "biological_process"
## 
## $`GO:0042147`
## [1] "biological_process"
## 
## $`PW:0000426`
## [1] "pathway"Below, we execute the same query using the GO.db package.
library("GO.db")
GOTERM[["GO:0005802"]]## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
##     structures located within the Golgi apparatus on the side distal to
##     the endoplasmic reticulum, from which secretory vesicles emerge.
##     The trans-Golgi network is important in the later stages of protein
##     secretion where it is thought to play a key role in the sorting and
##     targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans faceIt is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.
Both approaches have advantages. While online queries allow to obtain
the latest up-to-date information, such approaches rely on network
availability and quality. If reproducibility is a major issue, the
version of the database to be queried can easily be controlled with
off-line approaches. In the case of rols, although the
load date of a specific ontology can be queried with olsVersion, it
is not possible to query a specific version of an ontology.
rols 2.0 has substantially changed. While the table
below shows some correspondence between the old and new interface,
this is not always the case. The new interface relies on the
Ontology/Ontologies, Term/Terms and OlsSearch classes, that
need to be instantiated and can then be queried, as described above.
| version < 1.99 | version >= 1.99 | 
|---|---|
| ontologyLoadDate | olsLoadedandolsUpdated | 
| ontologyNames | Ontologies | 
| olsVersion | olsVersion | 
| allIds | terms | 
| isIdObsolete | isObsolete | 
| rootId | olsRoot | 
| olsQuery | OlsSearchandolsSearch | 
Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.
rols version >= 2.99 has been refactored to use the OLS4 REST API.httr.Term() and Terms().Properties().The CVParam class is used to handle controlled vocabulary. It can be
used for user-defined parameters
CVParam(name = "A user param", value = "the value")## [, , A user param, the value]or official controlled vocabulary (which triggers a query to the OLS service)
CVParam(label = "MS", accession = "MS:1000073")## [MS, MS:1000073, electrospray ionization, ]See ?CVParam for more details and examples.
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] DT_0.31              rols_2.30.2          GO.db_3.18.0        
## [4] AnnotationDbi_1.64.1 IRanges_2.36.0       S4Vectors_0.40.2    
## [7] Biobase_2.62.0       BiocGenerics_0.48.1  BiocStyle_2.30.0    
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3          sass_0.4.8              bitops_1.0-7           
##  [4] RSQLite_2.3.5           digest_0.6.34           magrittr_2.0.3         
##  [7] evaluate_0.23           bookdown_0.37           fastmap_1.1.1          
## [10] blob_1.2.4              jsonlite_1.8.8          GenomeInfoDb_1.38.6    
## [13] DBI_1.2.2               BiocManager_1.30.22     httr_1.4.7             
## [16] crosstalk_1.2.1         Biostrings_2.70.2       httr2_1.0.0            
## [19] jquerylib_0.1.4         cli_3.6.2               rlang_1.1.3            
## [22] crayon_1.5.2            XVector_0.42.0          ellipsis_0.3.2         
## [25] bit64_4.0.5             cachem_1.0.8            yaml_2.3.8             
## [28] tools_4.3.2             memoise_2.0.1           GenomeInfoDbData_1.2.11
## [31] curl_5.2.0              vctrs_0.6.5             R6_2.5.1               
## [34] png_0.1-8               lifecycle_1.0.4         zlibbioc_1.48.0        
## [37] KEGGREST_1.42.0         htmlwidgets_1.6.4       bit_4.0.5              
## [40] pkgconfig_2.0.3         bslib_0.6.1             glue_1.7.0             
## [43] xfun_0.42               knitr_1.45              htmltools_0.5.7        
## [46] rmarkdown_2.25          compiler_4.3.2          RCurl_1.98-1.14