--- title: "Keywords from the UniProt database" author: "Zuguang Gu (z.gu@dkfz.de)" date: '`r Sys.Date()`' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Keywords from the UniProt database} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- UniProt database provides a list of controlled vocabulary represented as keywords for genes or proteins (https://www.uniprot.org/keywords/). This is useful for summarizing gene functions in a compact way. This package provides data of keywords hierarchy and gene-keyword relations. First load the package: ```{r} library(UniProtKeywords) ``` First the release and source information of the data: ```{r} UniProtKeywords ``` The package has five data objects. The first contains basic information of every keyword term: ```{r} data(kw_terms) length(kw_terms) kw_terms[["Cell cycle"]] ``` The other four contain hierarchical structures of the keyword terms: ```{r} data(kw_parents) kw_parents[1:2] data(kw_children) kw_children[1:2] data(kw_ancestors) kw_ancestors[1:2] data(kw_offspring) kw_offspring[1:2] ``` The **UniProtKeywords** package has also compiled genesets of keywords for some species, which can get by the function `load_keyword_genesets()`. The argument is the taxon ID of a species. The full set of supported organisms can be found in the document of `load_keyword_genesets()`. ```{r} gl = load_keyword_genesets(9606) gl[3:4] # because gl[1:2] has a very long output, here we print gl[3:4] ``` Argument `as_table` can be set to `TRUE`, then `load_keyword_genesets()` returns a two-column data frame. ```{r} tb = load_keyword_genesets(9606, as_table = TRUE) head(tb) ``` To make it more flexible, you can also provide the Latin name or the common name of the species. ```{r} gl2 = load_keyword_genesets("mouse") gl2[3:4] ``` We can simply check some statistics. 1. Sizes of keyword genesets: ```{r, fig.width = 7, fig.height = 5} plot(table(sapply(gl, length)), log = "x", xlab = "Size of keyword genesets", ylab = "Number of keywords" ) ``` 2. Numbers of words in keywords: ```{r, fig.width = 7, fig.height = 5} plot(table(sapply(gregexpr(" |-|/", names(gl)), length)), xlab = "Number of words in keywords", ylab = "Number of keywords" ) ``` 3. Numbers of characters in keywords: ```{r, fig.width = 7, fig.height = 5} plot(table(nchar(names(gl))), xlab = "Number of characters in keywords", ylab = "Number of keywords" ) ``` Session info: ```{r} sessionInfo() ```