---
title: "An R interface to the ProteomeXchange repository"
author:
- name: Laurent Gatto
package: rpx
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{An R interface to the ProteomeXchange repository}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteKeywords{Infrastructure, Bioinformatics, Proteomics, Mass spectrometry}
  %\VignetteEncoding{UTF-8}
---

```{r env, echo = FALSE}
suppressPackageStartupMessages(library("BiocStyle"))
suppressPackageStartupMessages(library("Biostrings"))
```


# Introduction

The goal of the `r Biocpkg("rpx")` package is to provide programmatic
access to proteomics data from R, in particular to the ProteomeXchange
([Vizcaino J.A. et al,
2014](https://www.nature.com/articles/nbt.2839/)) central repository
(see http://www.proteomexchange.org/ and
http://central.proteomexchange.org/). Additional repositories are
likely to be added in the future.


# The `r Biocpkg("rpx")`  package

## PXDataset objects

The central object that handles data access is the `PXDataset`
(version 2) class. Such an instance can be generated by passing a
valid PX experiment identifier to the `PXDataset()` constructor.

```{r pxdata}
library("rpx")
id <- "PXD000001"
px <- PXDataset(id)
px
```

## Data and meta-data

Several attributes can be extracted from an `PXDataset` projects, as
described below.


The experiment identifier, that was originally used to create the
project can be extracted with the `pxid()` method:

```{r pxid}
pxid(px)
```

The file transfer url where the data files can be accessed can be
queried with the `pxurl()` method:

```{r purl}
pxurl(px)
```

The species the data has been generated the data can be obtain calling
the `pxtax()` function:

```{r pxtax}
pxtax(px)
```

Relevant bibliographic references can be queried with the
`pxref()` method:

```{r pxref}
strwrap(pxref(px))
```

All files available for the PX experiment can be obtained with the
`pxfiles` method:

```{r pxfiles}
pxfiles(px)
```

The complete or partial data set can be downloaded with the `pxget()`
function. The function takes a project instance as first mandatory
argument.

The next argument, `list`, specifies what files to download. If
missing, a menu is printed and the user can select a file. If set to
`"all"`, all files of the experiment are downloaded. One of multiple
file names, their indices or logicals can also be used to download
specific files.

```{r pxget}
f <- pxget(px, "F063721.dat-mztab.txt")
f
```

The `rpx` package makes use of the `r Biocpkg("BiocFileCache")`
package to avoid repeatedly dowloading data. When `PXDataset` projects
are created and and project files are downloaded, they stored in the
package's central or a user-defined cache. Next time the project is
instantiated with `PXDataset()` or a project file is downloaded with
`pxget()`, existing artefacts will be retrieve from cache, instead of
being created/downloaded from the remote server again. See `?rpxCache`
for details about caching.

<!-- Finally, a list of recent PX additions and updates can be obtained -->
<!-- using the `pxannounced()` function: -->

<!-- ```{r pxan} -->
<!-- pxannounced() -->
<!-- ``` -->

## A simple use-case

Below, we download the fasta file from the PXD000001 dataset and load
it with the Biostrings package.

```{r more, warning=FALSE}
fas <- grep("fasta", pxfiles(px), value = TRUE)
fas
f <- pxget(px, fas) ## file available in the rpx cache
f
```

```{r example1, message = FALSE}
library("Biostrings")
readAAStringSet(f)
```

# Questions and help

Either post questions on the [Bioconductor support
forum](https://support.bioconductor.org/) or open a GitHub
[issue](https://github.com/lgatto/rpx/issues).

# Session information

```{r si}
sessionInfo()
```