%\VignetteEngine{knitr} %\VignetteIndexEntry{An interface to proteomics data repositories} %\VignetteKeywords{Infrastructure, Bioinformatics, Proteomics, Mass spectrometry} %\VignettePackage{rpx} \documentclass[12pt]{article} <>= BiocStyle::latex() @ \usepackage{ae} %% don't like the helvetica font \bioctitle{\Biocpkg{rpx}: an \R{} interface to the ProteomeXchange repository} \author{ Laurent Gatto\\ \email{lg390@cam.ac.uk}\\ Computational Proteomics Unit\footnote{\url{http://cpu.sysbiol.cam.ac.uk}} } \begin{document} \maketitle % \tableofcontents % \newpage <>= suppressPackageStartupMessages(library("Biostrings")) suppressPackageStartupMessages(library("MSnbase")) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction}\label{sec:intro} The goal of the \Biocpkg{rpx} package is to provide programmatic access to proteomics data from \R{}, in particular to the ProteomeXchange\footnote{ Vizca\'ino J.A. et al. \textit{ProteomeXchange: globally co-ordinated proteomics data submission and dissemination}, Nature Biotechnology 2014, 32, 223 -- 226, doi:10.1038/nbt.2839. } (PX) central repository (see \url{http://www.proteomexchange.org/} and \url{http://central.proteomexchange.org/}). Additional repositories are likely to be added in the future. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{The \Biocpkg{rpx} package}\label{sec:functions} \subsection*{\Robject{PXDataset} objects} The central object that handles data access is the \Robject{PXDataset} class. Such an instance can be generated by passing a valid PX experiment identifier to the \Rfunction{PXDataset} constructor. <>= library("rpx") id <- "PXD000001" px <- PXDataset(id) px @ \subsection*{Data and meta-data} Several attributes can be extracted from an \Robject{PXDataset} instance, as described below. \bigskip The experiment identifier, that was originally used to create the \Robject{PXDataset} instance can be extracted with the \Rfunction{pxid} method: <>= pxid(px) @ The file transfer url where the data files can be accessed can be queried with the \Rfunction{pxurl} method: <>= pxurl(px) @ The species the data has been generated the data can be obtain calling the \Rfunction{pxtax} function: <>= pxtax(px) @ Relevant bibliographic references can be queried with the \Rfunction{pxref} method: <>= strwrap(pxref(px)) @ All files available for the PX experiment can be obtained with the \Rfunction{pxfiles} method: <>= pxfiles(px) @ The complete or partial data set can be downloaded with the \Rfunction{pxget} function. The function takes an instance of class \Robject{PXDataset} as first mandatory argument. The next argument, \Rcode{list}, specifies what files to download. If missing, a menu is printed and the user can select a file. If set to \Rcode{"all"}, all files of the experiment are downloaded in the working directory. Alternatively, numerics or logicals can also be used to subset the relevant files to be downloaded based on the \Rfunction{pxfiles(.)} output. The last argument, \Rcode{force}, can be set to \Rcode{TRUE} to force the download of files that already exists in the working directory. <>= pxget(px, "erwinia_carotovora.fasta") dir(pattern = "fasta") @ By default, \Rfunction{pxget} will not download and overwrite a file if already available. The last argument of \Rfunction{pxget}, \Rcode{force}, can be set to \Rcode{TRUE} to force the download of files that already exists in the working directory. <>= (i <- grep("fasta", pxfiles(px))) pxget(px, i) ## same as above @ Finally, a list of recent PX additions and updates can be obtained using the \Rfunction{pxannounced()} function: <>= pxannounced() @ \subsection*{A simple use-case} Below, we show how to automate the extraction of files of interest (fasta and mzTab files), download them and read them using appropriate Bioconductor infrastructure. <>= (mzt <- grep("F0.+mztab", pxfiles(px), value = TRUE)) (fas <- grep("fasta", pxfiles(px), value = TRUE)) pxget(px, c(mzt, fas)) library("Biostrings") readAAStringSet(fas) library("MSnbase") (x <- readMzTabData(mzt, "PEP")) head(exprs(x)) head(fData(x)[, 1:2]) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Session information}\label{sec:sessionInfo} <>= toLatex(sessionInfo(), locale = FALSE) @ %% \bibliography{rpx} \end{document}