--- title: "An R interface to the ProteomeXchange repository" author: - name: Laurent Gatto package: rpx output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{An R interface to the ProteomeXchange repository} %\VignetteEngine{knitr::rmarkdown} %\VignetteKeywords{Infrastructure, Bioinformatics, Proteomics, Mass spectrometry} %\VignetteEncoding{UTF-8} --- ```{r env, echo = FALSE} suppressPackageStartupMessages(library("BiocStyle")) suppressPackageStartupMessages(library("Biostrings")) ``` # Introduction The goal of the `r Biocpkg("rpx")` package is to provide programmatic access to proteomics data from R, in particular to the ProteomeXchange ([Vizcaino J.A. et al, 2014](https://www.nature.com/articles/nbt.2839/)) central repository (see http://www.proteomexchange.org/ and http://central.proteomexchange.org/). Additional repositories are likely to be added in the future. # The `r Biocpkg("rpx")` package ## PXDataset objects The central object that handles data access is the `PXDataset` (version 2) class. Such an instance can be generated by passing a valid PX experiment identifier to the `PXDataset()` constructor. ```{r pxdata} library("rpx") id <- "PXD000001" px <- PXDataset(id) px ``` ## Data and meta-data Several attributes can be extracted from an `PXDataset` projects, as described below. The experiment identifier, that was originally used to create the project can be extracted with the `pxid()` method: ```{r pxid} pxid(px) ``` The file transfer url where the data files can be accessed can be queried with the `pxurl()` method: ```{r purl} pxurl(px) ``` The species the data has been generated the data can be obtain calling the `pxtax()` function: ```{r pxtax} pxtax(px) ``` Relevant bibliographic references can be queried with the `pxref()` method: ```{r pxref} strwrap(pxref(px)) ``` All files available for the PX experiment can be obtained with the `pxfiles` method: ```{r pxfiles} pxfiles(px) ``` The complete or partial data set can be downloaded with the `pxget()` function. The function takes a project instance as first mandatory argument. The next argument, `list`, specifies what files to download. If missing, a menu is printed and the user can select a file. If set to `"all"`, all files of the experiment are downloaded. One of multiple file names, their indices or logicals can also be used to download specific files. ```{r pxget} f <- pxget(px, "F063721.dat-mztab.txt") f ``` The `rpx` package makes use of the `r Biocpkg("BiocFileCache")` package to avoid repeatedly dowloading data. When `PXDataset` projects are created and and project files are downloaded, they stored in the package's central or a user-defined cache. Next time the project is instantiated with `PXDataset()` or a project file is downloaded with `pxget()`, existing artefacts will be retrieve from cache, instead of being created/downloaded from the remote server again. See `?rpxCache` for details about caching. ## A simple use-case Below, we download the fasta file from the PXD000001 dataset and load it with the Biostrings package. ```{r more, warning=FALSE} fas <- grep("fasta", pxfiles(px), value = TRUE) fas f <- pxget(px, fas) ## file available in the rpx cache f ``` ```{r example1, message = FALSE} library("Biostrings") readAAStringSet(f) ``` # Questions and help Either post questions on the [Bioconductor support forum](https://support.bioconductor.org/) or open a GitHub [issue](https://github.com/lgatto/rpx/issues). # Session information ```{r si} sessionInfo() ```