Abstract This workshop will give delegates the opportunity to discover and try some of the recent R / Bioconductor developments for proteomics. Topics covered will including support for open community-driven formats for raw data and identification results, packages for peptide-spectrum matching, quantitative proteomics, mass spectrometry (MS) and quantitation data processing, and visualisation. The workshop material will be a self-contained vignette/workflow including example data.
This short tutorial is part of the ProteomicsBioc2014Workshop
package (version 0.2.0), available at https://github.com/ComputationalProteomicsUnit/ProteomicsBioc2014Workshop.
In Bioconductor version 2.14, there are respectively 53 proteomics and 31 mass spectrometry software packages and 7 mass spectrometry experiment packages. These respective packages can be extracted with the proteomicsPackages()
, massSpectrometryPackages()
and massSpectrometryDataPackages()
and explored interactively.
library("RforProteomics")
pp <- proteomicsPackages(biocv)
display(pp)
Type | Format | Package |
---|---|---|
raw | mzML, mzXML, netCDF, mzData | mzR (read) |
identification | mzIdentML | mzID (read) |
quantitation | mzQuantML | |
peak lists | mgf | MSnbase (read/write) |
other | mzTab | MSnbase (read/write) |
Contemporary MS-based proteomics data is disseminated through the ProteomeXchange infrastructure, which centrally coordinates submission, storage and dissemination through multiple data repositories, such as the PRIDE data base at the EBI for MS/MS experiments, PASSEL at the ISB for SRM data and the MassIVE resource. The rpx
is an interface to ProteomeXchange and provides a basic and unified access to PX data.
library("rpx")
pxannounced()
## 15 new ProteomeXchange annoucements
## Data.Set Publication.Data Message
## 1 PXD000605 2014-09-16 13:44:11 New
## 2 PXD000333 2014-09-16 13:08:03 New
## 3 PXD001006 2014-09-16 10:42:25 New
## 4 PXD001000 2014-09-16 10:07:44 New
## 5 PXD000966 2014-09-16 10:01:46 New
## 6 PXD001310 2014-09-16 09:00:47 New
## 7 PXD001061 2014-09-15 08:21:59 Updated information
## 8 PXD001127 2014-09-15 07:53:25 New
## 9 PXD001318 2014-09-13 01:18:16 New
## 10 PXD001317 2014-09-13 00:44:18 New
## 11 PXD001316 2014-09-13 00:19:10 New
## 12 PXD001315 2014-09-13 00:11:14 New
## 13 PXD001314 2014-09-13 00:05:00 New
## 14 PXD001313 2014-09-12 23:13:44 New
## 15 PXD001312 2014-09-12 16:34:37 New
px <- PXDataset("PXD000001")
px
## Object of class "PXDataset"
## Id: PXD000001 with 8 files
## [1] 'F063721.dat' ... [8] 'erwinia_carotovora.fasta'
## Use 'pxfiles(.)' to see all files.
pxfiles(px)
## [1] "F063721.dat"
## [2] "F063721.dat-mztab.txt"
## [3] "PRIDE_Exp_Complete_Ac_22134.xml.gz"
## [4] "PRIDE_Exp_mzData_Ac_22134.xml.gz"
## [5] "PXD000001_mztab.txt"
## [6] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"
## [7] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw"
## [8] "erwinia_carotovora.fasta"
Other metadata for the px
dataset:
pxtax(px)
pxurl(px)
pxref(px)
Data files can then be downloaded with the pxget
function as illustrated below. Alternatively, the file is available on the workshop’s Amazon virtual machine in /data/Proteomics/data/
.
mzf <- "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"
mzf <- file.path("/data/Proteomics/data", mzf)
if (!file.exists(mzf))
mzf <- pxget(px, pxfiles(px)[6])
## Downloading 1 file
## TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML already present.
mzf
## [1] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"
The mzR
package provides an interface to the proteowizard code base, the legacy RAMP is a non-sequential parser and other C/C++ code to access various raw data files, such as mzML
, mzXML
, netCDF
, and mzData
. The data is accessed on-disk, i.e it does not get loaded entirely in memory by default. The three main functions are openMSfile
to create a file handle to a raw data file, header
to extract metadata about the spectra contained in the file and peaks
to extract one or multiple spectra of interest. Other functions such as instrumentInfo
, or runInfo
can be used to gather general information about a run.
library("mzR")
ms <- openMSfile(mzf)
ms
## Mass Spectrometry file handle.
## Filename: TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML
## Number of scans: 7534
hd <- header(ms)
dim(hd)
## [1] 7534 21
names(hd)
## [1] "seqNum" "acquisitionNum"
## [3] "msLevel" "polarity"
## [5] "peaksCount" "totIonCurrent"
## [7] "retentionTime" "basePeakMZ"
## [9] "basePeakIntensity" "collisionEnergy"
## [11] "ionisationEnergy" "lowMZ"
## [13] "highMZ" "precursorScanNum"
## [15] "precursorMZ" "precursorCharge"
## [17] "precursorIntensity" "mergedScan"
## [19] "mergedResultScanNum" "mergedResultStartScanNum"
## [21] "mergedResultEndScanNum"
Extract the index of the MS2 spectrum with the highest base peak intensity and plot its spectrum. Is the data centroided or in profile mode?
The RforProteomics
package distributes a small identification result file (see ?TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzid
) that we load and parse using infrastructure from the mzID
package.
library("mzID")
(f <- dir(system.file("extdata", package = "ProteomicsBioc2014Workshop"),
pattern = "mzid", full.names=TRUE))
## [1] "/home/lgatto/R/x86_64-unknown-linux-gnu-library/3.1/ProteomicsBioc2014Workshop/extdata/TMT_Erwinia.mzid.gz"
id <- mzID(f)
## reading TMT_Erwinia.mzid.gz... DONE!
id
## An mzID object
##
## Software used: MS-GF+ (version: Beta (v10072))
##
## Rawfile: /home/lgatto/dev/00_github/RforProteomics/sandbox/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML
##
## Database: /home/lgatto/dev/00_github/RforProteomics/sandbox/erwinia_carotovora.fasta
##
## Number of scans: 5287
## Number of PSM's: 5563
Various data can be extracted from the mzID
object, using one the accessor functions such as database
, scans
, peptides
, … The object can also be converted into a data.frame
using the flatten
function.
Is there a relation between the length of a protein and the number of identified peptides, conditioned by the (average) e-value of the identifications?
While searches are generally performed using third-party software independently of R or can be started from R using a system
call, the rTANDEM
package allows one to execute such searches using the X!Tandem engine. The shinyTANDEM
provides a interactive interface to explore the search results.
library("rTANDEM")
?rtandem
library("shinyTANDEM")
?shinyTANDEM
The above sections introduced low-level interfaces to raw and identification results. The MSnbase
package provides abstractions for raw data through the MSnExp
class and containers for quantification data via the MSnSet
class. Both store the actual assay data (spectra or quantitation matrix) and sample and feature metadata, accessed with spectra
(or the [
, [[
operators) or exprs
, pData
and fData
.
The figure below give a schematics of an MSnSet
instance and the relation between the assay data and the respective feature and sample metadata.
Another useful slot is processingData
, accessed with processingData(.)
, that records all the processing that objects have undergone since their creation (see examples below).
The readMSData
will parse the raw data, extract the MS2 spectra and construct an MS experiment file.
library("MSnbase")
quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzXML$")
quantFile
## [1] "/home/lg390/R/x86_64-unknown-linux-gnu-library/3.2/MSnbase/extdata/dummyiTRAQ.mzXML"
msexp <- readMSData(quantFile, verbose=FALSE)
msexp
## Object of class "MSnExp"
## Object size in memory: 0.2 Mb
## - - - Spectra data - - -
## MS level(s): 2
## Number of MS1 acquisitions: 1
## Number of MSn scans: 5
## Number of precursor ions: 5
## 4 unique MZs
## Precursor MZ's: 437.8 - 716.34
## MSn M/Z range: 100 2016.66
## MSn retention times: 25:1 - 25:2 minutes
## - - - Processing information - - -
## Data loaded: Wed Sep 17 01:23:39 2014
## MSnbase version: 1.13.15
## - - - Meta data - - -
## phenoData
## rowNames: 1
## varLabels: sampleNames
## varMetadata: labelDescription
## Loaded from:
## dummyiTRAQ.mzXML
## protocolData: none
## featureData
## featureNames: X1.1 X2.1 ... X5.1 (5 total)
## fvarLabels: spectrum
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
The identification results stemming from the same raw data file can then be used to add PSM matches.
## find path to a mzIdentML file
identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
full.name = TRUE, pattern = "mzid$")
identFile
## [1] "/home/lg390/R/x86_64-unknown-linux-gnu-library/3.2/MSnbase/extdata/dummyiTRAQ.mzid"
msexp <- addIdentificationData(msexp, identFile)
## reading dummyiTRAQ.mzid... DONE!
fData(msexp)
## spectrum scan number(s) passthreshold rank calculatedmasstocharge
## X1.1 1 1 TRUE 1 645.0375
## X2.1 2 2 TRUE 1 546.9633
## X3.1 3 NA NA NA NA
## X4.1 4 NA NA NA NA
## X5.1 5 5 TRUE 1 437.2997
## experimentalmasstocharge chargestate ms-gf:denovoscore ms-gf:evalue
## X1.1 645.3741 3 77 79.36958
## X2.1 546.9586 3 39 13.46615
## X3.1 NA NA NA NA
## X4.1 NA NA NA NA
## X5.1 437.8040 2 5 366.38422
## ms-gf:rawscore ms-gf:specevalue assumeddissociationmethod
## X1.1 -39 5.527468e-05 CID
## X2.1 -30 9.399048e-06 CID
## X3.1 NA NA <NA>
## X4.1 NA NA <NA>
## X5.1 -42 2.577830e-04 CID
## isotopeerror isdecoy post pre end start accession length
## X1.1 1 FALSE A R 186 170 ECA0984;ECA3829 231
## X2.1 0 FALSE A K 62 50 ECA1028 275
## X3.1 <NA> NA <NA> <NA> NA NA <NA> NA
## X4.1 <NA> NA <NA> <NA> NA NA <NA> NA
## X5.1 1 FALSE L K 28 22 ECA0510 166
## description
## X1.1 DNA mismatch repair protein;acetolactate synthase isozyme III large subunit
## X2.1 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase
## X3.1 <NA>
## X4.1 <NA>
## X5.1 putative capsular polysacharide biosynthesis transferase
## pepseq modified modification databaseFile
## X1.1 VESITARHGEVLQLRPK FALSE NA erwinia_carotovora.fasta
## X2.1 IDGQWVTHQWLKK FALSE NA erwinia_carotovora.fasta
## X3.1 <NA> NA NA <NA>
## X4.1 <NA> NA NA <NA>
## X5.1 LVILLFR FALSE NA erwinia_carotovora.fasta
## identFile nprot npep.prot npsm.prot npsm.pep
## X1.1 2 2 1 1 1
## X2.1 2 1 1 1 1
## X3.1 NA NA NA NA NA
## X4.1 NA NA NA NA NA
## X5.1 2 1 1 1 1
msexp[[1]]
## Object of class "Spectrum2"
## Precursor: 645.3741
## Retention time: 25:1
## Charge: 3
## MSn level: 2
## Peaks count: 2921
## Total ion count: 668170086
plot(msexp[[1]], full=TRUE)
as(msexp[[1]], "data.frame")[100:105, ]
## mz i
## 100 141.0990 588594.812
## 101 141.1015 845401.250
## 102 141.1041 791352.125
## 103 141.1066 477623.000
## 104 141.1091 155376.312
## 105 141.1117 4752.541
There are a wide range of proteomics quantitation techniques that can broadly be classified as labelled vs. label-free, depending whether the features are labelled prior the MS acquisition and the MS level at which quantitation is inferred, namely MS1 or MS2.
Label-free | Labelled | |
---|---|---|
MS1 | XIC | SILAC, 15N |
MS2 | Counting | iTRAQ, TMT |
In terms of raw data quantitation, most efforts have been devoted to MS2-level quantitation. Label-free XIC quantitation has however been addressed in the frame of metabolomics data processing by the xcms
infrastructure.
An MSnExp
is converted to an MSnSet
by the quantitation
method. Below, we use the iTRAQ 4-plex isobaric tagging strategy (defined by the iTRAQ4
parameter; other tags are available).
plot(msexp[[1]], full=TRUE, reporters = iTRAQ4)
msset <- quantify(msexp, method = "trap", reporters = iTRAQ4, verbose=FALSE)
exprs(msset)
## iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
## X1.1 4483.320 4873.996 6743.441 4601.378
## X2.1 1918.082 1418.040 1117.601 1581.954
## X3.1 15210.979 15296.256 15592.760 16550.502
## X4.1 4133.103 5069.983 4724.845 4694.801
## X5.1 11947.881 13061.875 12809.491 12911.479
processingData(msset)
## - - - Processing information - - -
## Data loaded: Wed Sep 17 01:23:39 2014
## iTRAQ4 quantification by trapezoidation: Wed Sep 17 01:23:40 2014
## MSnbase version: 1.13.15
Other MS2 quantitation methods available in quantify
include the (normalised) spectral index SI
and (normalised) spectral abundance factor SAF
or simply a simple count method.
exprs(si <- quantify(msexp, method = "SIn"))
## 1
## ECA0510 0.003588641
## ECA1028 0.001470129
exprs(saf <- quantify(msexp, method = "NSAF"))
## 1
## ECA0510 0.6235828
## ECA1028 0.3764172
Note that spectra that have not been assigned any peptide (NA
) or that match non-unique peptides (npsm > 1
) are discarded in the counting process.
See also The isobar
package supports quantitation from centroided mgf
peak lists or its own tab-separated files that can be generated from Mascot and Phenyx vendor files.
The PSI mzTab
file format is aimed at providing a simpler (than XML formats) and more accessible file format to the wider community. It is composed of a key-value metadata section and peptide/protein/small molecule tabular sections.
mztf <- pxget(px, pxfiles(px)[2])
## Downloading 1 file
## F063721.dat-mztab.txt already present.
(mzt <- readMzTabData(mztf, what = "PEP"))
## Warning: Support for mzTab version 0.9 only. Support will be added soon.
## Detected a metadata section
## Detected a peptide section
## MSnSet (storageMode: lockedEnvironment)
## assayData: 1528 features, 6 samples
## element names: exprs
## protocolData: none
## phenoData
## rowNames: sub[1] sub[2] ... sub[6] (6 total)
## varLabels: abundance
## varMetadata: labelDescription
## featureData
## featureNames: 1 2 ... 1528 (1528 total)
## fvarLabels: sequence accession ... uri (14 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
## - - - Processing information - - -
## mzTab read: Wed Sep 17 01:23:42 2014
## MSnbase version: 1.13.15
It is also possible to import arbitrary spreadsheets as MSnSet
objects into R with the readMSnSet2
function. The main 2 arguments of the function are (1) a text-based spreadsheet and (2) column names of indices that identify the quantitation data.
csv <- dir(system.file ("extdata" , package = "pRolocdata"),
full.names = TRUE, pattern = "pr800866n_si_004-rep1.csv")
getEcols(csv, split = ",")
## [1] "\"Protein ID\"" "\"FBgn\""
## [3] "\"Flybase Symbol\"" "\"No. peptide IDs\""
## [5] "\"Mascot score\"" "\"No. peptides quantified\""
## [7] "\"area 114\"" "\"area 115\""
## [9] "\"area 116\"" "\"area 117\""
## [11] "\"PLS-DA classification\"" "\"Peptide sequence\""
## [13] "\"Precursor ion mass\"" "\"Precursor ion charge\""
## [15] "\"pd.2013\"" "\"pd.markers\""
ecols <- 7:10
res <- readMSnSet2(csv, ecols)
head(exprs(res))
## area.114 area.115 area.116 area.117
## 1 0.379000 0.281000 0.225000 0.114000
## 2 0.420000 0.209667 0.206111 0.163889
## 3 0.187333 0.167333 0.169667 0.476000
## 4 0.247500 0.253000 0.320000 0.179000
## 5 0.216000 0.183000 0.342000 0.259000
## 6 0.072000 0.212333 0.573000 0.142667
head(fData(res))
## Protein.ID FBgn Flybase.Symbol No..peptide.IDs Mascot.score
## 1 CG10060 FBgn0001104 G-ialpha65A 3 179.86
## 2 CG10067 FBgn0000044 Act57B 5 222.40
## 3 CG10077 FBgn0035720 CG10077 5 219.65
## 4 CG10079 FBgn0003731 Egfr 2 86.39
## 5 CG10106 FBgn0029506 Tsp42Ee 1 52.10
## 6 CG10130 FBgn0010638 Sec61beta 2 79.90
## No..peptides.quantified PLS.DA.classification Peptide.sequence
## 1 1 PM
## 2 9 PM
## 3 3
## 4 2 PM
## 5 1 GGVFDTIQK
## 6 3 ER/Golgi
## Precursor.ion.mass Precursor.ion.charge pd.2013 pd.markers
## 1 PM unknown
## 2 PM unknown
## 3 unknown unknown
## 4 PM unknown
## 5 626.887 2 Phenotype 1 unknown
## 6 ER/Golgi ER
Each different types of quantitative data will require their own pre-processing and normalisation steps. Both isobar
and MSnbase
allow to correct for isobaric tag impurities normalise the quantitative data.
data(itraqdata)
qnt <- quantify(itraqdata, method = "trap",
reporters = iTRAQ4, verbose = FALSE)
impurities <- matrix(c(0.929,0.059,0.002,0.000,
0.020,0.923,0.056,0.001,
0.000,0.030,0.924,0.045,
0.000,0.001,0.040,0.923),
nrow=4, byrow = TRUE)
## or, using makeImpuritiesMatrix()
## impurities <- makeImpuritiesMatrix(4)
qnt.crct <- purityCorrect(qnt, impurities)
processingData(qnt.crct)
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011
## iTRAQ4 quantification by trapezoidation: Wed Sep 17 01:23:44 2014
## Purity corrected: Wed Sep 17 01:23:44 2014
## MSnbase version: 1.1.22
plot0 <- function(x, y, main = "") {
old.par <- par(no.readonly = TRUE)
on.exit(par(old.par))
par(mar = c(4, 4, 1, 1))
par(mfrow = c(2, 2))
sx <- sampleNames(x)
sy <- sampleNames(y)
for (i in seq_len(ncol(x))) {
plot(exprs(x)[, i], exprs(y)[, i], log = "xy",
xlab = sx[i], ylab = sy[i])
grid()
}
}
plot0(qnt, qnt.crct)
Various normalisation methods can be applied the MSnSet
instances using the normalise
method: variance stabilisation (vsn
), quantile (quantiles
), median or mean centring (center.media
or center.mean
), …
qnt.crct.nrm <- normalise(qnt.crct,"quantiles")
plot0(qnt, qnt.crct.nrm)
The combineFeatures
method combines spectra/peptides quantitation values into protein data. The grouping is defined by the groupBy
parameter, which is generally taken from the feature metadata (protein accessions, for example).
## arbitraty grouping
g <- factor(c(rep(1, 25), rep(2, 15), rep(3, 15)))
prt <- combineFeatures(qnt.crct.nrm, groupBy = g, fun = "sum")
## Combined 55 features into 3 using sum
processingData(prt)
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011
## iTRAQ4 quantification by trapezoidation: Wed Sep 17 01:23:44 2014
## Purity corrected: Wed Sep 17 01:23:44 2014
## Normalised (quantiles): Wed Sep 17 01:23:44 2014
## Combined 55 features into 3 using sum: Wed Sep 17 01:23:44 2014
## MSnbase version: 1.1.22
Finally, proteomics data analysis is generally hampered by missing values. Missing data imputation is a sensitive operation whose success will be guided by many factors, such as degree and (non-)random nature of the missingness. Missing value in MSnSet
instances can be filtered out and imputed using the filterNA
and impute
functions.
set.seed(1)
qnt0 <- qnt
exprs(qnt0)[sample(prod(dim(qnt0)), 10)] <- NA
table(is.na(qnt0))
##
## FALSE TRUE
## 209 11
qnt00 <- filterNA(qnt0)
dim(qnt00)
## [1] 44 4
qnt.imp <- impute(qnt0)
plot0(qnt, qnt.imp)
The mzt
instance created from the mzTab
file has the following is a TMT 6-plex with the following design:
In this TMT 6-plex experiment, four exogenous proteins were spiked into an equimolar Erwinia carotovora lysate with varying proportions in each channel of quantitation; yeast enolase (ENO) at 10:5:2.5:1:2.5:10, bovine serum albumin (BSA) at 1:2.5:5:10:5:1, rabbit glycogen phosphorylase (PHO) at 2:2:2:2:1:1 and bovin cytochrome C (CYT) at 1:1:1:1:1:2. Proteins were then digested, differentially labelled with TMT reagents, fractionated by reverse phase nanoflow UPLC (nanoACQUITY, Waters), and analysed on an LTQ Orbitrap Velos mass spectrometer (Thermo Scientic).
Explore the mzt
data using some of the illustrated functions. The heatmap and MAplot (see MAplot
function), taken from the RforProteomics
vignette, have been produced using the same data.
R in general and Bioconductor in particular are well suited for the statistical analysis of data. Several packages provide dedicated resources for proteomics data:
MSstats
: A set of tools for statistical relative protein significance analysis in DDA, SRM and DIA experiments.
msmsTest
: Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package.
isobar
also provides dedicated infrastructure for the statistical analysis of isobaric data.The MLInterfaces
package provides a unified interface to a wide range of machine learning algorithms. Initially developed for microarray and ExpressionSet
instances, the pRoloc
package enables application of these algorithms to MSnSet
data.
library("MLInterfaces")
library("pRoloc")
library("pRolocdata")
data(dunkley2006)
traininds <- which(fData(dunkley2006)$markers != "unknown")
ans <- MLearn(markers ~ ., data = t(dunkley2006), knnI(k = 5), traininds)
ans
## MLInterfaces classification output container
## The call was:
## MLearn(formula = markers ~ ., data = t(dunkley2006), .method = knnI(k = 5),
## trainInd = traininds)
## Predicted outcome distribution for test set:
##
## ER Golgi mit/plastid PM vacuole
## 204 87 128 111 17
## Summary of scores on test set (use testScores() method for details):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4000 1.0000 1.0000 0.9653 1.0000 1.0000
kcl <- MLearn( ~ ., data = dunkley2006, kmeansI, centers = 12)
kcl
## clusteringOutput: partition table
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 78 48 29 118 43 38 116 94 9 32 49 35
## The call that created this object was:
## MLearn(formula = ~., data = dunkley2006, .method = kmeansI, centers = 12)
plot(kcl, exprs(dunkley2006))
hcl <- MLearn( ~ ., data = t(dunkley2006), hclustI(distFun = dist, cutParm = list(k = 4)))
hcl
## clusteringOutput: partition table
##
## 1 2 3 4
## 4 4 4 4
## The call that created this object was:
## MLearn(formula = ~., data = t(dunkley2006), .method = hclustI(distFun = dist,
## cutParm = list(k = 4)))
plot(hcl, exprs(t(dunkley2006)))
All the Bioconductor annotation infrastructure, such as biomaRt
, GO.db
, organism specific annotations, .. are directly relevant to the analysis of proteomics data. Some proteomics-centred annotations such as the PSI Mass Spectrometry Ontology, Molecular Interaction (PSI MI 2.5) or Protein Modifications are available through the rols
. Data from the Human Protein Atlas is available via the hpar
package.
isobar
.synapter
.pRoloc
.MALDIquant
package.PSICQUIC
package.Additional relevant packages are described in the RforProteomics
vignette.
## R Under development (unstable) (2014-08-05 r66309)
## Platform: x86_64-unknown-linux-gnu (64-bit)
##
## attached base packages:
## [1] parallel methods stats graphics grDevices utils datasets
## [8] base
##
## other attached packages:
## [1] shinyTANDEM_1.3.0 xtable_1.7-3 mixtools_1.0.2
## [4] segmented_0.4-0.0 MASS_7.3-34 boot_1.3-11
## [7] shiny_0.10.1.9006 rTANDEM_1.5.1 data.table_1.9.2
## [10] BiocInstaller_1.15.5 pRolocdata_1.3.5 pRoloc_1.5.16
## [13] MLInterfaces_1.45.3 cluster_1.15.2 annotate_1.43.5
## [16] XML_3.98-1.1 AnnotationDbi_1.27.9 GenomeInfoDb_1.1.18
## [19] IRanges_1.99.25 S4Vectors_0.1.5 rpx_1.1.1
## [22] mzID_1.3.5 RforProteomics_1.3.5 MSnbase_1.13.15
## [25] BiocParallel_0.99.19 mzR_1.99.0 Rcpp_0.11.2
## [28] Biobase_2.25.0 BiocGenerics_0.11.4 knitr_1.6.12
##
## loaded via a namespace (and not attached):
## [1] affy_1.43.3 affyio_1.33.0
## [3] BatchJobs_1.3 BBmisc_1.7
## [5] biocViews_1.33.14 BradleyTerry2_1.0-5
## [7] brew_1.0-6 brglm_0.5-9
## [9] car_2.0-21 caret_6.0-35
## [11] Category_2.31.1 checkmate_1.3
## [13] class_7.3-11 codetools_0.2-9
## [15] colorspace_1.2-4 DBI_0.3.0
## [17] digest_0.6.4 doParallel_1.0.8
## [19] e1071_1.6-4 evaluate_0.5.5
## [21] fail_1.2 FNN_1.1
## [23] foreach_1.4.2 formatR_1.0
## [25] gdata_2.13.3 genefilter_1.47.6
## [27] ggplot2_1.0.0 graph_1.43.0
## [29] grid_3.2.0 gridSVG_1.4-0
## [31] GSEABase_1.27.1 gtable_0.1.2
## [33] gtools_3.4.1 htmltools_0.2.6
## [35] httpuv_1.3.0 impute_1.39.0
## [37] interactiveDisplay_1.3.9 iterators_1.0.7
## [39] kernlab_0.9-19 labeling_0.3
## [41] lattice_0.20-29 limma_3.21.14
## [43] lme4_1.1-7 lpSolve_5.6.10
## [45] MALDIquant_1.11 Matrix_1.1-4
## [47] mclust_4.3 minqa_1.2.3
## [49] munsell_0.4.2 mvtnorm_1.0-0
## [51] nlme_3.1-117 nloptr_1.0.4
## [53] nnet_7.3-8 pcaMethods_1.55.0
## [55] pls_2.4-3 plyr_1.8.1
## [57] preprocessCore_1.27.1 proto_0.3-10
## [59] proxy_0.4-12 R6_2.0
## [61] randomForest_4.6-10 RBGL_1.41.1
## [63] RColorBrewer_1.0-5 RCurl_1.95-4.3
## [65] rda_1.0.2-2 reshape2_1.4
## [67] RJSONIO_1.3-0 R.methodsS3_1.6.1
## [69] R.oo_1.18.0 rpart_4.1-8
## [71] RSQLite_0.11.4 RUnit_0.4.26
## [73] R.utils_1.33.0 sampling_2.6
## [75] scales_0.2.4 sendmailR_1.1-2
## [77] sfsmisc_1.0-26 splines_3.2.0
## [79] stats4_3.2.0 stringr_0.6.2
## [81] survival_2.37-7 tools_3.2.0
## [83] vsn_3.33.0 zlibbioc_1.11.1