---
title: "JASPAR2020"
output: BiocStyle::html_document
author:
name: Damir Baranasic
affiliation: Imperial College London, Faculty of Medicine, Institute of Clinical Sciences, Hammersmith Campus, Du Cane Road, W12 0NN, London
bibliography: JASPAR2020.bib
vignette: >
%\VignetteIndexEntry{JASPAR2020}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
library(JASPAR2020)
library(TFBSTools)
})
```
# Introduction
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant
transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs
across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE
collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for
nematodes, 10 for insects, and 7 for fungi), and 157 PFMs were updated (125 for vertebrates,
28 for plants, and 3 for insects). These new profiles represent an 18% expansion compared to
the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding
profiles for which our curators did not find orthogonal supporting evidence in the literature. This
collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding
profiles.
The easiest way to use the JASPAR2020 data package [@Fornes:2019] is by `TFBSTools` package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.
```{r setup}
library(JASPAR2020)
library(TFBSTools)
```
# Retrieving matrices from JASPAR2020 by ID or name
Matrices from JASPAR can be retrieved using either `getMatrixByID` or ` getMatrixByName` function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a `PFMatrix` object, while the later one returns a `PFMatrixList` with multiple `PFMatrix` objects.
```{r example_name_id, tidy=TRUE}
## the user assigns a single matrix ID to the argument ID
pfm <- getMatrixByID(JASPAR2020, ID="MA0139.1")
## the function returns a PFMatrix object
pfm
```
The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in `TFBSTools` package.
```{r seq_logo}
seqLogo(toICM(pfm))
```
```{r multiple_matrix_id}
## the user assigns multiple matrix IDs to the argument ID
pfmList <- getMatrixByID(JASPAR2020, ID=c("MA0139.1", "MA1102.1"))
## the function returns a PFMatrix object
pfmList
## PFMatrixList can be subsetted to extract enclosed PFMatrix objects
pfmList[[2]]
```
`getMatrixByName` retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.
```{r getMatrixByName_example}
pfm <- getMatrixByName(JASPAR2020, name="Arnt")
pfm
pfmList <- getMatrixByName(JASPAR2020, name=c("Arnt", "Ahr::Arnt"))
pfmList
```
# The use of filtering criteria
The `getMatrixSet` function fetches all matrices that match criteria defined by the named arguments, and it returns a `PFMatrixList` object.
```{r example_set, tidy=TRUE}
## select all matrices found in a specific species and constructed from the SELEX experiment
opts <- list()
opts[["species"]] <- 9606
opts[["type"]] <- "SELEX"
opts[["all_versions"]] <- TRUE
PFMatrixList <- getMatrixSet(JASPAR2020, opts)
PFMatrixList
## retrieve all matrices constructed from SELEX experiment
opts2 <- list()
opts2[["type"]] <- "SELEX"
PFMatrixList2 <- getMatrixSet(JASPAR2020, opts2)
PFMatrixList2
```
Additional details about TFBS matrix analysis can be found in the [TFBSTools](https://bioconductor.org/packages/release/bioc/html/TFBSTools.html) documantation.
# Session Info
Here is the output of `sessionInfo()` on the system on which this document was compiled:
```{r session_info, echo=FALSE}
sessionInfo()
```
# Bibliography