scp 1.2.0
scp packageThe scp package is used to process and analyse mass spectrometry
(MS)-based single cell proteomics (SCP) data. The functions rely on a
specific data structure that wraps
QFeatures
objects (Gatto (2020)) around
SingleCellExperiment
objects (Amezquita et al. (2019)). This data structure could be seen as
Matryoshka dolls were the SingleCellExperiment objects are small
dolls contained in the bigger QFeatures doll.
The SingleCellExperiment class provides a dedicated framework for
single-cell data. The SingleCellExperiment serves as an interface to
many cutting-edge methods for processing, visualizing and analysis
single-cell data. More information about the SingleCellExperiment
class and associated methods can be found in the
OSCA book.
The QFeatures class is a data framework dedicated to manipulate and
process MS-based quantitative data. It preserves the relationship
between the different levels of information: peptide to spectrum match
(PSM) data, peptide data and protein data. The QFeatures package
also provides an interface to many utility functions to streamline the
processing MS data. More information about MS data analysis tools can
be found in the
RforMassSpectrometry project.
(#fig:scp_framework)scp relies on SingleCellExperiment and QFeatures objects.
Before running the vignette we need to load the scp package.
library("scp")We also load ggplot2, magrittr and dplyr for convenient data
manipulation and plotting.
library("ggplot2")
library("magrittr")
library("dplyr")This vignette will guide you through some common steps of mass
spectrometry-based single-cell proteomics (SCP) data analysis. SCP is
an emerging field and further research is required to develop a
principled analysis workflow. Therefore, we do not guarantee that the
steps presented here are the best steps for this type of data analysis.
This vignette performs the steps that were described in the SCoPE2
landmark paper (Specht et al. (2021)). We hope to convince the reader that,
although the workflow is probably not optimal, scp has the full
potential to perform standardized and principled data analysis. All
functions presented here are comprehensively documented, highly modular,
can easily be extended with new algorithms. Suggestions, feature
requests or bug reports are warmly welcome. Feel free to open an issue
in the
GitHub repository.
The first step is to read in the PSM quantification table
generated by, for example, MaxQuant (Tyanova, Temu, and Cox (2016)). We created a
small example data by subsetting the MaxQuant evidence.txt table
provided in the SCoPE2 landmark paper (Specht et al. (2021)). The
mqScpData table is a typical example of what you would get after
reading in a CSV file using read.csv or read.table. See
?mqScpData for more information about the table content.
data("mqScpData")
dim(mqScpData)
#> [1] 1361  149
mqScpData[1:10, 1:5]
#>                                                                                 uid
#> 1          _(Acetyl (Protein N-term))ATNFLAHEK_ 2 0.00052636 190321S_LCA10_X_FP97AG
#> 2           _(Acetyl (Protein N-term))ATNFLAHEK_ 2 0.00058789 190222S_LCA9_X_FP94BM
#> 3  _(Acetyl (Protein N-term))SHTILLVQPTK_ 2 4.0315e-24 190914S_LCB3_X_16plex_Set_21
#> 4         _(Acetyl (Protein N-term))SHTILLVQPTK_ 2 4.7622e-06 190222S_LCA9_X_FP94BM
#> 5        _(Acetyl (Protein N-term))SHTILLVQPTK_ 2 6.8709e-09 190321S_LCA10_X_FP97AG
#> 6               _(Acetyl (Protein N-term))SLVIPEK_ 2 0.053705 190222S_LCA9_X_FP94BM
#> 7          _(Acetyl (Protein N-term))SLVIPEK_ 2 0.0595 190914S_LCB3_X_16plex_Set_21
#> 8            _(Acetyl (Protein N-term))VDREQLVQK_ 2 0.004872 190321S_LCA10_X_FP97AG
#> 9                                        _AAGLALK_ 2 0.020042 190222S_LCA9_X_FP94BM
#> 10                                      _AAGLALK_ 2 0.038752 190321S_LCA10_X_FP97AG
#>       Sequence Length           Modifications
#> 1    ATNFLAHEK      9 Acetyl (Protein N-term)
#> 2    ATNFLAHEK      9 Acetyl (Protein N-term)
#> 3  SHTILLVQPTK     11 Acetyl (Protein N-term)
#> 4  SHTILLVQPTK     11 Acetyl (Protein N-term)
#> 5  SHTILLVQPTK     11 Acetyl (Protein N-term)
#> 6      SLVIPEK      7 Acetyl (Protein N-term)
#> 7      SLVIPEK      7 Acetyl (Protein N-term)
#> 8    VDREQLVQK      9 Acetyl (Protein N-term)
#> 9      AAGLALK      7              Unmodified
#> 10     AAGLALK      7              Unmodified
#>                         Modified.sequence
#> 1    _(Acetyl (Protein N-term))ATNFLAHEK_
#> 2    _(Acetyl (Protein N-term))ATNFLAHEK_
#> 3  _(Acetyl (Protein N-term))SHTILLVQPTK_
#> 4  _(Acetyl (Protein N-term))SHTILLVQPTK_
#> 5  _(Acetyl (Protein N-term))SHTILLVQPTK_
#> 6      _(Acetyl (Protein N-term))SLVIPEK_
#> 7      _(Acetyl (Protein N-term))SLVIPEK_
#> 8    _(Acetyl (Protein N-term))VDREQLVQK_
#> 9                               _AAGLALK_
#> 10                              _AAGLALK_In order to convert this tabular data to a scp-compatible
QFeatures object, we need to provide a metadata table where rows
contain sample information and columns must contain at least:
Any additional information about the samples will be stored in the
colData.
We provide an example of such a data frame. It was formatted from the
annotation table provided in the SCoPE2 article. See
?sampleAnnotation for more information about the table content.
data("sampleAnnotation")
sampleAnnotation
#>                         Raw.file               Channel SampleType lcbatch
#> 1          190222S_LCA9_X_FP94BM  Reporter.intensity.1    Carrier    LCA9
#> 2          190222S_LCA9_X_FP94BM  Reporter.intensity.2  Reference    LCA9
#> 3          190222S_LCA9_X_FP94BM  Reporter.intensity.3     Unused    LCA9
#> 4          190222S_LCA9_X_FP94BM  Reporter.intensity.4   Monocyte    LCA9
#> 5          190222S_LCA9_X_FP94BM  Reporter.intensity.5      Blank    LCA9
#> 6          190222S_LCA9_X_FP94BM  Reporter.intensity.6   Monocyte    LCA9
#> 7          190222S_LCA9_X_FP94BM  Reporter.intensity.7 Macrophage    LCA9
#> 8          190222S_LCA9_X_FP94BM  Reporter.intensity.8 Macrophage    LCA9
#> 9          190222S_LCA9_X_FP94BM  Reporter.intensity.9 Macrophage    LCA9
#> 10         190222S_LCA9_X_FP94BM Reporter.intensity.10 Macrophage    LCA9
#> 11         190222S_LCA9_X_FP94BM Reporter.intensity.11 Macrophage    LCA9
#> 12         190222S_LCA9_X_FP94BM Reporter.intensity.12     Unused    LCA9
#> 13         190222S_LCA9_X_FP94BM Reporter.intensity.13     Unused    LCA9
#> 14         190222S_LCA9_X_FP94BM Reporter.intensity.14     Unused    LCA9
#> 15         190222S_LCA9_X_FP94BM Reporter.intensity.15     Unused    LCA9
#> 16         190222S_LCA9_X_FP94BM Reporter.intensity.16     Unused    LCA9
#> 17        190321S_LCA10_X_FP97AG  Reporter.intensity.1    Carrier   LCA10
#> 18        190321S_LCA10_X_FP97AG  Reporter.intensity.2  Reference   LCA10
#> 19        190321S_LCA10_X_FP97AG  Reporter.intensity.3     Unused   LCA10
#> 20        190321S_LCA10_X_FP97AG  Reporter.intensity.4 Macrophage   LCA10
#> 21        190321S_LCA10_X_FP97AG  Reporter.intensity.5   Monocyte   LCA10
#> 22        190321S_LCA10_X_FP97AG  Reporter.intensity.6 Macrophage   LCA10
#> 23        190321S_LCA10_X_FP97AG  Reporter.intensity.7 Macrophage   LCA10
#> 24        190321S_LCA10_X_FP97AG  Reporter.intensity.8 Macrophage   LCA10
#> 25        190321S_LCA10_X_FP97AG  Reporter.intensity.9 Macrophage   LCA10
#> 26        190321S_LCA10_X_FP97AG Reporter.intensity.10 Macrophage   LCA10
#> 27        190321S_LCA10_X_FP97AG Reporter.intensity.11 Macrophage   LCA10
#> 28        190321S_LCA10_X_FP97AG Reporter.intensity.12     Unused   LCA10
#> 29        190321S_LCA10_X_FP97AG Reporter.intensity.13     Unused   LCA10
#> 30        190321S_LCA10_X_FP97AG Reporter.intensity.14     Unused   LCA10
#> 31        190321S_LCA10_X_FP97AG Reporter.intensity.15     Unused   LCA10
#> 32        190321S_LCA10_X_FP97AG Reporter.intensity.16     Unused   LCA10
#> 33  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.1    Carrier    LCB3
#> 34  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.2  Reference    LCB3
#> 35  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.3     Unused    LCB3
#> 36  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.4     Unused    LCB3
#> 37  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.5 Macrophage    LCB3
#> 38  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.6 Macrophage    LCB3
#> 39  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.7      Blank    LCB3
#> 40  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.8   Monocyte    LCB3
#> 41  190914S_LCB3_X_16plex_Set_21  Reporter.intensity.9 Macrophage    LCB3
#> 42  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.10   Monocyte    LCB3
#> 43  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.11      Blank    LCB3
#> 44  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.12 Macrophage    LCB3
#> 45  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.13 Macrophage    LCB3
#> 46  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.14 Macrophage    LCB3
#> 47  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.15 Macrophage    LCB3
#> 48  190914S_LCB3_X_16plex_Set_21 Reporter.intensity.16 Macrophage    LCB3
#> 49 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.1      Blank   LCA10
#> 50 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.2      Blank   LCA10
#> 51 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.3      Blank   LCA10
#> 52 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.4      Blank   LCA10
#> 53 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.5      Blank   LCA10
#> 54 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.6      Blank   LCA10
#> 55 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.7      Blank   LCA10
#> 56 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.8      Blank   LCA10
#> 57 190321S_LCA10_X_FP97_blank_01  Reporter.intensity.9      Blank   LCA10
#> 58 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.10      Blank   LCA10
#> 59 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.11      Blank   LCA10
#> 60 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.12      Blank   LCA10
#> 61 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.13      Blank   LCA10
#> 62 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.14      Blank   LCA10
#> 63 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.15      Blank   LCA10
#> 64 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.16      Blank   LCA10
#>    sortday digest
#> 1       s8      N
#> 2       s8      N
#> 3       s8      N
#> 4       s8      N
#> 5       s8      N
#> 6       s8      N
#> 7       s8      N
#> 8       s8      N
#> 9       s8      N
#> 10      s8      N
#> 11      s8      N
#> 12      s8      N
#> 13      s8      N
#> 14      s8      N
#> 15      s8      N
#> 16      s8      N
#> 17      s8      Q
#> 18      s8      Q
#> 19      s8      Q
#> 20      s8      Q
#> 21      s8      Q
#> 22      s8      Q
#> 23      s8      Q
#> 24      s8      Q
#> 25      s8      Q
#> 26      s8      Q
#> 27      s8      Q
#> 28      s8      Q
#> 29      s8      Q
#> 30      s8      Q
#> 31      s8      Q
#> 32      s8      Q
#> 33      s9      R
#> 34      s9      R
#> 35      s9      R
#> 36      s9      R
#> 37      s9      R
#> 38      s9      R
#> 39      s9      R
#> 40      s9      R
#> 41      s9      R
#> 42      s9      R
#> 43      s9      R
#> 44      s9      R
#> 45      s9      R
#> 46      s9      R
#> 47      s9      R
#> 48      s9      R
#> 49      s8   <NA>
#> 50      s8   <NA>
#> 51      s8   <NA>
#> 52      s8   <NA>
#> 53      s8   <NA>
#> 54      s8   <NA>
#> 55      s8   <NA>
#> 56      s8   <NA>
#> 57      s8   <NA>
#> 58      s8   <NA>
#> 59      s8   <NA>
#> 60      s8   <NA>
#> 61      s8   <NA>
#> 62      s8   <NA>
#> 63      s8   <NA>
#> 64      s8   <NA>The two tables are supplied to the readSCP function.
scp <- readSCP(featureData = mqScpData,
               colData = sampleAnnotation,
               channelCol = "Channel",
               batchCol = "Raw.file")
#> Loading data as a 'SingleCellExperiment' object
#> Splitting data based on 'Raw.file'
#> Formatting sample metadata (colData)
#> Formatting data as a 'QFeatures' objectAs indicated by the output on the console, readSCP proceeds as follows:
If featureData is the path to a CSV file, it reads the file using
read.csv. The table is converted to a SingleCellExperiment
object. readSCP needs to know in which field(s) the quantitative
data is stored. Those field name(s) is/are provided by the
channelCol field in the metaData table. So in this example, the
column names holding the quantitative data in mqScpData are
stored in the column named Channel in sampleAnnotation.
The SingleCellExperiment object is then split according to
batch. The split is performed depending on the batchCol field in
featureData, in this case the data will be split according to the
Raw.file column in mqScpData. Raw.file contains the names of the
acquisition runs that was used by MaxQuant to retrieve the raw data
files.
The sample metadata is generated from the supplied colData. Note
that in order for readSCP to correctly match the feature data
with the metadata, colData must contain the same batchCol field
with batch names.
Finally, the split feature data and the sample metadata are stored
in a single QFeatures object.
Here is a compact overview of the data:
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 395 rows and 16 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 487 rows and 16 columns 
#>  [3] 190321S_LCA10_X_FP97_blank_01: SingleCellExperiment with 109 rows and 16 columns 
#>  [4] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 370 rows and 16 columnsWe can see that the scp object we created is a QFeatures object
containing 4 assays. Each assay has an associated name, this is the
batch name that was used for splitting. We can also see that each
assay is a SingleCellExperiment object. The rows represent the
peptide to spectrum matches (PSMs), the number vary depending on the
batch. Finally, all three assays contains 16 columns that correspond
to the 16 TMT channels recorded during the 4 MS runs.
The SCoPE2 data we provide as an example contains runs that were
acquired with two TMT protocols. The 3 first assays were acquired
using the TMT-11 protocol and the last assay was acquired
using a TMT-16 protocol. When exporting the table, the authors
combined the data in a single table, were missing channels in the
TMT-11 data are filled with NA. This is essential when working in
table format, but since scp keeps the runs separated we can allow for
different number of channels per run. readSCP can automatically
detect and remove columns that contain only NAs, by supplying the
argument removeEmptyCols = TRUE.
scp <- readSCP(featureData = mqScpData,
               colData = sampleAnnotation,
               channelCol = "Channel",
               batchCol = "Raw.file",
               removeEmptyCols = TRUE)
#> Loading data as a 'SingleCellExperiment' object
#> Splitting data based on 'Raw.file'
#> Formatting sample metadata (colData)
#> Formatting data as a 'QFeatures' object
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 395 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 487 rows and 11 columns 
#>  [3] 190321S_LCA10_X_FP97_blank_01: SingleCellExperiment with 109 rows and 11 columns 
#>  [4] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 370 rows and 16 columnsSee here that the 3 first assays contain 11 columns that correspond to the TMT-11 labels and the last assay contains 16 columns that correspond to the TMT-16 labels.
All single-cell data contain many zeros. The zeros can be biological
zeros or technical zeros and differentiating between the two types is
not a trivial task. To avoid artefacts in downstream steps, we replace
the zeros by the missing value NA. The zeroIsNA function takes the
QFeatures object and the name(s) or index/indices of the assay(s) to
clean and automatically replaces any zero in the selected quantitative
data by NA.
scp <- zeroIsNA(scp, i = 1:4)A common steps in SCP is to filter out low-confidence PSMs. Each PSM
assay contains feature meta-information that are stored in the
rowData of the assays. The QFeatures package allows to quickly
filter the rows of an assay by using these information. The available
variables in the rowData are listed below for each assay.
rowDataNames(scp)
#> CharacterList of length 4
#> [["190222S_LCA9_X_FP94BM"]] uid Sequence Length ... residual participated
#> [["190321S_LCA10_X_FP97AG"]] uid Sequence Length ... residual participated
#> [["190321S_LCA10_X_FP97_blank_01"]] uid Sequence ... residual participated
#> [["190914S_LCB3_X_16plex_Set_21"]] uid Sequence ... residual participatedBelow are some examples of criteria that are used to identify low-confidence. The information is readily available since this was computed by MaxQuant:
We can perform this filtering using the filterFeatures function from
QFeatures. filterFeatures automatically accesses the feature
metadata and selects the rows that meet the provided condition(s). For
instance, Reverse != "+" keeps the rows for which the Reverse
variable in the rowData is not "+" (i.e. the PSM is not matched
to the decoy database).
scp <- filterFeatures(scp,
                      ~ Reverse != "+" &
                          Potential.contaminant != "+" &
                          !is.na(PIF) & PIF > 0.8)To avoid proceeding with failed runs, another interesting filter is to
remove assays with too few features. If a batch contains less than,
for example, 150 features we can then suspect something wrong happened
in that batch and it should be removed. Using dims, we can query the
dimensions (hence the number of features and the number of samples) of
all assays contained in the dataset.
dims(scp)
#>      190222S_LCA9_X_FP94BM 190321S_LCA10_X_FP97AG 190321S_LCA10_X_FP97_blank_01
#> [1,]                   283                    318                            60
#> [2,]                    11                     11                            11
#>      190914S_LCB3_X_16plex_Set_21
#> [1,]                          200
#> [2,]                           16Actually, a QFeatures object can be seen as a three-order array:
\(features \times samples \times assay\). Hence, QFeatures supports
three-order subsetting x[rows, columns, assays]. We first select the
assays that have sufficient PSMs (the number of rows is greater than
150), and then subset the scp object for the assays that meet the
criterion.
keepAssay <- dims(scp)[1, ] > 150
scp <- scp[, , keepAssay]
#> Warning: 'experiments' dropped; see 'metadata'
#> harmonizing input:
#>   removing 11 sampleMap rows not in names(experiments)
#>   removing 11 colData rownames not in sampleMap 'primary'
scp
#> An instance of class QFeatures containing 3 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 283 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 318 rows and 11 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 200 rows and 16 columnsNotice the 190321S_LCA10_X_FP97_blank_01 sample was removed because
it did not contain sufficient features, as expected from a blank
run. This could also have been the case for failed runs.
Another type of filtering is specific to SCP. In the SCoPE2 analysis, the authors suggest a filters based on the sample to carrier ratio (SCR), that is the reporter ion intensity of a single-cell sample divided by the reporter ion intensity of the carrier channel (200 cells) from the same batch. It is expected that the carrier intensities are much higher than the single-cell intensities.
The SCR can be computed using the computeSCR function from
scp. The function must be told which channels are the samples that
must be divided and which channel contains the carrier. This
information is provided in the sample metadata and is accessed using
the colData, under the SampleType field.
table(colData(scp)[, "SampleType"])
#> 
#>      Blank    Carrier Macrophage   Monocyte  Reference     Unused 
#>          3          3         20          5          3          4In this dataset, SampleType gives the type of sample that is present
in each TMT channel. The SCoPE2 protocole includes 5 types of samples:
Carrier) contain 200 cell equivalents and
are meant to boost the peptide identification rate.Reference) contain 5 cell equivalents
and are used to partially correct for between-run variation.Unused) are channels that are left empty due
to isotopic cross-contamination by the carrier channel.Blank) contain samples that do not contain any
cell but are processed as single-cell samples.Macrophage) or monocyte
(Monocyte).The computeSCR function expects the following input:
QFeatures datasetThe function creates a new field in the rowData of the assays. We
store the computed SCR in the rowData and call it MeanSCR.
scp <- computeSCR(scp,
                  i = 1:3,
                  colDataCol = "SampleType",
                  carrierPattern = "Carrier",
                  samplePattern = "Macrophage|Monocyte",
                  rowDataName = "MeanSCR")Before applying the filter, we plot the distribution of the average
SCR. We can extract the MeanSCR variable from the rowData of
several assays using the rowDataToDF. It takes the rowData
field(s) of interest and returns a DataFrame table.
scp %>%
    rowDataToDF(i = 1:3,
                vars = "MeanSCR") %>%
    data.frame %>%
    ggplot(aes(x = MeanSCR)) +
    geom_histogram() +
    geom_vline(xintercept = c(1/200, 0.1),
               lty = c(2, 1)) +
    scale_x_log10()The expected ratio between single cells and the carrier is 1/200
(dashed line). We can see that the distribution mode is slightly
shifted towards higher ratios with a mode around 0.01. However, there
are a few PSMs that stand out of the distribution and have a much
higher signal than expected, indicating something wrong happened
during the quantification of those PSMs. We therefore filter out PSMs
with an average SCR higher than 0.1 (solide line). This is again easily
performed using the filterFeatures functions.
scp <- filterFeatures(scp,
                      ~ !is.na(MeanSCR) &
                          MeanSCR < 0.1)Finally, we might also want to control for false discovery rate (FDR).
MaxQuant already computes posterior error probabilities (PEP), but
filtering on PEPs is too conservative (Käll et al. (2008)) so we provide the
pep2qvalue function to convert PEPs to q-values that are directly
related to FDR. We here compute the q-values from the PEP (dart_PEP)
across all 3 assays. dart_PEP contains the PEP values that have been
updated using the DART-ID algorithm (Chen, Franks, and Slavov (2019)). The function will
store the results in the rowData, we here asked to name the new
column qvalue_PSMs.
scp <- pep2qvalue(scp,
                  i = 1:3,
                  PEP = "dart_PEP",
                  rowDataName = "qvalue_PSMs")We also allow to compute q-values at peptide or protein level rather
than PSM. In this case, you need to supply the groupBy argument.
Suppose we want to compute the q-values at protein level, we can fetch
the protein information stored under protein in the rowData. This
time, we store the q-values in a new field called qvalue_proteins.
scp <- pep2qvalue(scp,
                  i = 1:3,
                  PEP = "dart_PEP",
                  groupBy = "protein",
                  rowDataName = "qvalue_proteins")We can now filter the PSM to control, let’s say, the protein FDR at
1%. This can be performed using filterFeatures because the q-values
were stored in the rowData.
scp <- filterFeatures(scp,
                      ~ qvalue_proteins < 0.01)In order to partialy correct for between-run variation, SCoPE2
suggests computing relative reporter ion intensities. This means that
intensities measured for single-cells are divided by the reference
channel containing 5-cell equivalents. We use the divideByReference
function that divides channels of interest by the reference
channel. Similarly to computeSCR, we can point to the samples and
the reference columns in each assay using the annotation contained in
the colData.
We here divide all columns (using the regular expression wildcard .)
by the reference channel (Reference).
scp <- divideByReference(scp,
                         i = 1:3,
                         colDataCol = "SampleType",
                         samplePattern = ".",
                         refPattern = "Reference")Now that the PSM assays are processed, we can aggregate them to
peptides. This is performed using the aggregateFeaturesOverAssays
function. For each assay, the function aggregates several PSMs into a
unique peptide. This is best illustrated by the figure below.
(#fig:features_aggregation)Conceptual illustration of feature aggregation.
Remember there currently are three assays containing the PSM data.
scp
#> An instance of class QFeatures containing 3 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 16 columnsThe PSMs are aggregated over the fcol feature variable, here
peptides. We also need to supply an aggregating function that will
tell how to combine the quantitative data of the PSMs to aggregate. We
here aggregate the PSM data using the median value per sample thanks
to the matrixStats:colMedians function. Other functions can be used
and we refer to ?aggregateFeatures for more information about
available aggregation functions. The aggregateFeaturesOverAssays
function will create a new assay for each aggregated assay. We name
the aggregated assays using the original names and appending
peptides_ at the start.
scp <- aggregateFeaturesOverAssays(scp,
                                   i = 1:3,
                                   fcol = "peptide",
                                   name = paste0("peptides_", names(scp)),
                                   fun = matrixStats::colMedians, na.rm = TRUE)Notice that 3 new assays were created in the scp object. Those new
assays contain the aggregated features while the three first assays
are unchanged. This allows to keep track of the data processing.
scp
#> An instance of class QFeatures containing 6 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 16 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 185 rows and 16 columnsUnder the hood, the QFeatures architecture preserves the
relationship between the aggregated assays. See ?AssayLinks for more
information on relationships between assays.
Up to now, we kept the data belonging to each MS run in separate
assays. We now combine all batches into a single assay. This is done
using the joinAssays function from the QFeatures package. Note
that we now use the aggregated assays, so assay 4 to 6.
scp <- joinAssays(scp,
                  i = 4:6,
                  name = "peptides")In this case, one new assay is created in the scp object that
combines the data from assay 4 to 6. The samples are always distinct
so the number of column in the new assay (here \(48\)) will always
equals the sum of the columns in the assays to join (here \(16 + 16 + 16\)). The feature in the joined assay might contain less features than
the sum of the rows of the assays to join since common features
between assays are joined in a single row.
scp
#> An instance of class QFeatures containing 7 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 16 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 185 rows and 16 columns 
#>  [7] peptides: SingleCellExperiment with 432 rows and 38 columnsAnother common step in single-cell data analysis pipelines is to remove low-quality cells. After subsetting for the samples of interest, we will use 2 metrics: the median relative intensities per cell and the median coefficient of variation (CV) per cell.
We can subset the cells of interest, that is the blank samples
(sc_0), the macrophages (sc_m0) and the monocytes (sc_u). This
can easily be done by taking advantage of the colData and the
subsetting operators. Recall that QFeatures objects support
three-order subsetting, x[rows, columns, assays], where columns are
the samples of interest.
scp
#> An instance of class QFeatures containing 7 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 16 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 11 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 11 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 185 rows and 16 columns 
#>  [7] peptides: SingleCellExperiment with 432 rows and 38 columns
scp <- scp[, scp$SampleType %in% c("Blank", "Macrophage", "Monocyte"), ]The subsetting removes unwanted samples from all assays. The filtered data set contains the same number of assays with the same number of features, but the number of columns (hence sampled) decreased.
scp
#> An instance of class QFeatures containing 7 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 8 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 8 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 12 columns 
#>  [4] peptides_190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 8 columns 
#>  [5] peptides_190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 8 columns 
#>  [6] peptides_190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 185 rows and 12 columns 
#>  [7] peptides: SingleCellExperiment with 432 rows and 28 columnsWe compute the median relative reporter ion intensity for each cell
separately and apply a filter based on this statistic. This procedure
recalls that of library size filtering commonly performed in scRNA-Seq
data analysis, where the library size is the sum of the counts in each
single cell. We will store the median intensity in the colData. The
medians are computed for every sample from the quantitative data using
the apply function. The data matrix can be extracted from a
SingleCellExperiment using the assay function. Setting
MARGIN = 2 means that we compute the median by column.
scp[["peptides"]] %>%
    assay %>%
    apply(MARGIN = 2, 
          FUN = median, 
          na.rm = TRUE) ->
    medians
head(medians)
#> 190222S_LCA9_X_FP94BMReporter.intensity.4 
#>                                 0.7572816 
#> 190222S_LCA9_X_FP94BMReporter.intensity.5 
#>                                 0.4766553 
#> 190222S_LCA9_X_FP94BMReporter.intensity.6 
#>                                 0.8314786 
#> 190222S_LCA9_X_FP94BMReporter.intensity.7 
#>                                 0.4127720 
#> 190222S_LCA9_X_FP94BMReporter.intensity.8 
#>                                 1.0302978 
#> 190222S_LCA9_X_FP94BMReporter.intensity.9 
#>                                 0.8830915The computed medians are then inserted in the colData.
colData(scp)[names(medians), "MedianRI"] <- mediansLooking at the distribution of the median per cell can highlight low-quality cells.
scp %>%
    colData %>%
    data.frame %>%
    ggplot() +
    aes(x = MedianRI, 
        y = SampleType,
        fill = SampleType) +
    geom_boxplot() +
    scale_x_log10()The blanks samples should not contain any peptide information and are therefore used to assess the amount of background signal. The graph above confirms that the signal measured in single-cells (macrophages and monocytes) is above the background signal, hence no filtering is needed. Would it not be the case, the same procedure as in the previous section can be used for selecting the cells that have an associated median RI lower that a defined threshold.
The median CV measures the consistency of quantification for a group
of peptides that belong to a protein. We remove cells that exhibit
high median CV over the different proteins. We compute the median CV
per cell using the computeMedianCV function from the scp
package. The function takes the peptides assay and computes the CV
for each protein in each cell. To perform this, we must supply the
name of the rowData field that contains the protein information
through the groupBy argument. We also only want to compute CVs if we
have at least 5 peptides per protein. Finally, we also perform a
normalization and divide the columns by the median. The computed
median CVs are automatically stored in the colData under the name
that is supplied, here MedianCV.
scp <- medianCVperCell(scp,
                       i = 1:3,
                       groupBy = "protein",
                       nobs = 5, 
                       norm = "div.median",
                       na.rm = TRUE,
                       colDataName = "MedianCV")The computed CVs are stored in the colData of the peptides
assay and holds the median CV per cell computed using at least 5
observations (peptides). The main interest of computing the median CV
per cell is to filter cells with reliable quantification. The blank
samples are not expected to have reliable quantifications and hence
can be used to estimate an empirical null distribution of the CV. This
distribution helps defining a threshold that filters out single-cells
that contain noisy quantification.
scp %>%
    getWithColData("peptides") %>%
    colData %>%
    data.frame %>%
    ggplot(aes(x = MedianCV,
               fill = SampleType)) +
    geom_boxplot() +
    geom_vline(xintercept = 0.65)We can see that the protein quantification for single-cells are much more consistent than for blank samples. Based on the distribution of the blanks, we decide to keep the cells that have a median CV lower than 0.65. Note this example is inaccurate because the null distribution is based on only 3 blank samples, but more sets could lead to a better estimation of the CV null distribution.
scp <- scp[, !is.na(scp$MedianCV) & scp$MedianCV < 0.65, ]We can now remove the blank samples since all QC metrics are now computed.
scp <- scp[, scp$SampleType != "Blank", ]In this vignette, the peptide data are further processed before aggregation to proteins. The steps are: normalization, filter peptides based on missing data and log-transformation.
The columns (samples) of the peptide data are first normalized by
dividing the relative intensities by the median relative
intensities. Then, the rows (peptides) are normalized by dividing the
relative intensities by the mean relative intensities. The normalized
data is stored in a separate assay. This normalization procedure is
suggested in the SCoPE2 analysis and is applied using the sweep
method. Beside the dataset and the assay to normalize, the method
expects a MARGIN, that is either row-wise (1) or column-wise (2)
transformation, the FUN function to apply and STATS, a vector of
values to apply. More conventional normalization procedure can be
found in ?QFeatures::normalize.
## Divide columns by median
scp <- sweep(scp, 
             i = "peptides",
             MARGIN = 2,
             FUN = "/",
             STATS = colMedians(assay(scp[["peptides"]]), na.rm = TRUE),
             name = "peptides_norm_col")
## Divide rows by mean
scp <- sweep(scp,
             i = "peptides_norm_col",
             MARGIN = 1,
             FUN = "/",
             STATS = rowMeans(assay(scp[["peptides_norm_col"]]),  na.rm = TRUE),
             name = "peptides_norm")Notice each call to sweep created a new assay.
scp
#> An instance of class QFeatures containing 9 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 6 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 6 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 6 columns 
#>  ...
#>  [7] peptides: SingleCellExperiment with 432 rows and 18 columns 
#>  [8] peptides_norm_col: SingleCellExperiment with 432 rows and 18 columns 
#>  [9] peptides_norm: SingleCellExperiment with 432 rows and 18 columnsLet’s point out
Peptides that contain many missing values are not
informative. Therefore, another common procedure is to remove higly
missing data. In this example, we remove peptides with more than 99 %
missing data. This is done using the filterNA function from
QFeatures.
scp <- filterNA(scp,
                i = "peptides_norm",
                pNA = 0.99)In this vignette, we perform log2-transformation using the
logTransform method from QFeatures. Other log-transformation can
be applied by changing the base argument.
scp <- logTransform(scp,
                    base = 2,
                    i = "peptides_norm",
                    name = "peptides_log")Similarly to sweep, logTransform creates a new assay in scp.
Similarly to aggregating PSM data to peptide data, we can aggregate
peptide data to protein data using the aggregateFeatures function.
scp <- aggregateFeatures(scp,
                         i = "peptides_log",
                         name = "proteins",
                         fcol = "protein",
                         fun = matrixStats::colMedians, na.rm = TRUE)
#> Your quantitative and row data contain missing values. Please read the
#> relevant section(s) in the aggregateFeatures manual page regarding the
#> effects of missing values on data aggregation.The only difference between aggregateFeatures and
aggregateFeaturesOverAssays is that the second function can
aggregate several assay at once whereas the former only takes one
assay to aggregate. Hence, only a single assay, proteins, was
created in the scp object.
scp
#> An instance of class QFeatures containing 11 assays:
#>  [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 239 rows and 6 columns 
#>  [2] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 277 rows and 6 columns 
#>  [3] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 186 rows and 6 columns 
#>  ...
#>  [9] peptides_norm: SingleCellExperiment with 413 rows and 18 columns 
#>  [10] peptides_log: SingleCellExperiment with 413 rows and 18 columns 
#>  [11] proteins: SingleCellExperiment with 94 rows and 18 columnsAfter the second aggregation, the proteins assay in this example
contains quantitative information for 89 proteins in 15 single-cells.
The protein data is further processed in three steps: normalization,
imputation (using the KNN algorithm) and batch correction (using the
ComBat algorithm).
Normalization is performed similarly to peptide normalization. We use the same functions, but since the data were log-transformed at the peptide level, we subtract by the statistic (median or mean) instead of dividing.
scp %>%
    ## Center columns with median
    sweep(i = "proteins",
          MARGIN = 2,
          FUN = "-",
          STATS = colMedians(assay(scp[["proteins"]]),
                             na.rm = TRUE),
          name = "proteins_norm_col") %>%
    ## Center rows with mean
    sweep(i = "proteins_norm_col",
          MARGIN = 1,
          FUN = "-",
          STATS = rowMeans(assay(.[["proteins_norm_col"]]),
                           na.rm = TRUE),
          name = "proteins_norm") ->
    scpThe protein data contains a lot of missing values.
scp[["proteins_norm"]] %>%
    assay %>%
    is.na %>%
    mean
#> [1] 0.3291962The average missingness in the proteins assay is around 25
%. Including more samples and hence more batches can increase the
missingness up to 70 % as seen for the complete SCoPE2 dataset
(Specht et al. (2021)). Whether imputation is beneficial or deleterious for
the data will not be discussed in this vignette. But taking those
missing value into account is essential to avoid artefacts in
downstream analyses. The data imputation is performed using the K
nearest neighbors algorithm, with k = 3. This is available from the
impute mehtod. More details about the arguments can be found in
?impute::impute.knn.
scp <- impute(scp,
              i = "proteins_norm",
              method = "knn",
              k = 3, rowmax = 1, colmax= 1,
              maxp = Inf, rng.seed = 1234)
#> Loading required namespace: imputeNote that after imputation, no value are missing.
scp[["proteins_norm"]] %>%
    assay %>%
    is.na %>%
    mean
#> [1] 0A very important step for processing SCP data is to correct for batch effects. Batch effects are caused by technical variation occuring during different MS runs. Since only a small number of single-cells can be acquired at once, batch effects are unavoidable.
The ComBat function from the sva package can be used to perform
batch correction as it is performed in the SCoPE2 analysis. We do not
claim that ComBat is the best algorithm for batch correcting SCP
data and other batch correcting methods could be used using the same
procedure.
We first extract the assay to process.
sce <- getWithColData(scp, "proteins_norm")
#> Warning: 'experiments' dropped; see 'metadata'
#> harmonizing input:
#>   removing 144 sampleMap rows not in names(experiments)Next, we need to provide a design matrix and the batch annotation to
Combat. The design matrix allows to protect variables of interest,
in our case SampleType.
batch <- colData(sce)$Raw.file
model <- model.matrix(~ SampleType, data = colData(sce))We then load and call ComBat and overwrite the data matrix. Recall
the data matrix can be accessed using the assay function.
library(sva)
assay(sce) <- ComBat(dat = assay(sce),
                     batch = batch,
                     mod = model)Finally, we add the batch corrected assay to the QFeatures object
and create the feature links.
addAssay(scp,
         y = sce,
         name = "proteins_batchC") %>%
    addAssayLinkOneToOne(from = "proteins_norm",
                         to = "proteins_batchC") ->
    scpBecause each assay contains SingelCellExperiment objects, we can
easily apply methods developed in the scRNA-Seq field. A useful
package for dimension reduction on single-cell data is the scater.
library(scater)
#> Loading required package: SingleCellExperiment
#> Loading required package: scuttleThis package provides streamline functions to computes various dimension reduction such as PCA, UMAP, t-SNE, NMF, MDS, ….
PCA can be computed using the runPCA method. It returns a
SingleCellExperiment object for which the dimension reduction
results are stored in the reducedDim slot.
scp[["proteins_batchC"]] <- runPCA(scp[["proteins_batchC"]],
                                   ncomponents = 5,
                                   ntop = Inf,
                                   scale = TRUE,
                                   exprs_values = 1,
                                   name = "PCA")The computed PCA can be displayed using the plotReducedDim function. The
dimred arguments should give the name of the dimension reduction results to
plot, here we called it PCA. The samples are colored by type of sample.
plotReducedDim(scp[["proteins_batchC"]],
               dimred = "PCA",
               colour_by = "SampleType",
               point_alpha = 1)This is a minimalistic example with only a few plotted cells, but the original SCoPE2 dataset contained more than thousand cells.
Similarly to PCA, we can compute a UMAP using the runUMAP
method. Note however that the UMAP implementation requires a
initialization, usually provided by PCA. The previous PCA results are
used automatically when supplying dimred = "PCA" (PCA is the name
of the dimension reduction result that we supplied in the previous
section).
scp[["proteins_batchC"]] <- runUMAP(scp[["proteins_batchC"]],
                                    ncomponents = 2,
                                    ntop = Inf,
                                    scale = TRUE,
                                    exprs_values = 1,
                                    n_neighbors = 3,
                                    dimred = "PCA",
                                    name = "UMAP")The computed UMAP can be displayed using the plotReducedDim
function. The dimred arguments gives the name of the dimension
reduction results to plot, here we called it UMAP. The samples are
colored by type of sample.
plotReducedDim(scp[["proteins_batchC"]],
               dimred = "UMAP",
               colour_by = "SampleType",
               point_alpha = 1)The UMAP plot is a very interesting plot for large datasets. A UMAP on this small example dataset is not useful but is shown for illustration.
The QFeatures plot shows the quantitative data for a features at the
different expression levels. For instance, suppose we are interested
in the protein Plastin-2 (protein ID is P13796). A useful QC is to
monitor the data processing at the PSM, peptide and protein
level. This can easily be done thanks to the QFeatures
framework. Using the subsetByFeature, we can extract the protein of
interest and its related features in the other assays. The data is
formatted to a long format table that can easily be plugged in the
ggplot2 visualization tool.
scp %>%
    ## Get the features related to Plastin-2 (P13796)
    subsetByFeature("P13796") %>%
    ## Format the `QFeatures` to a long format table
    longFormat(colDataCols = c("Raw.file", "SampleType", "Channel")) %>%
    data.frame %>%
    ## This is used to preserve ordering of the samples and assays in ggplot2
    mutate(assay = factor(assay, levels = names(scp)),
           Channel = sub("Reporter.intensity.", "TMT-", Channel),
           Channel = factor(Channel, levels = unique(Channel))) %>%
    ## Start plotting
    ggplot(aes(x = Channel, y = value, group = rowname, col = SampleType)) +
    geom_point() +
    ## Plot every assay in a separate facet
    facet_wrap(facets = vars(assay), scales = "free_y", ncol = 3) +
    ## Annotate plot
    xlab("Channels") +
    ylab("Intensity (arbitrary units)") +
    ## Improve plot aspect
    theme(axis.text.x = element_text(angle = 90),
          strip.text = element_text(hjust = 0),
          legend.position = "bottom")This graph helps to keep track of the data processing. We can see how the different PSMs are progressively aggregated to peptides and then to proteins as well as how the normalization, imputation or batch correction impact the distribution of the quantifications.
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
 [1] scater_1.20.0               scuttle_1.2.0              
 [3] SingleCellExperiment_1.14.0 sva_3.40.0                 
 [5] BiocParallel_1.26.0         genefilter_1.74.0          
 [7] mgcv_1.8-35                 nlme_3.1-152               
 [9] dplyr_1.0.6                 magrittr_2.0.1             
[11] ggplot2_3.3.3               scp_1.2.0                  
[13] QFeatures_1.2.0             MultiAssayExperiment_1.18.0
[15] SummarizedExperiment_1.22.0 Biobase_2.52.0             
[17] GenomicRanges_1.44.0        GenomeInfoDb_1.28.0        
[19] IRanges_2.26.0              S4Vectors_0.30.0           
[21] BiocGenerics_0.38.0         MatrixGenerics_1.4.0       
[23] matrixStats_0.58.0          BiocStyle_2.20.0           
loaded via a namespace (and not attached):
 [1] ggbeeswarm_0.6.0          colorspace_2.0-1         
 [3] ellipsis_0.3.2            XVector_0.32.0           
 [5] BiocNeighbors_1.10.0      clue_0.3-59              
 [7] farver_2.1.0              bit64_4.0.5              
 [9] RSpectra_0.16-0           AnnotationDbi_1.54.0     
[11] fansi_0.4.2               splines_4.1.0            
[13] sparseMatrixStats_1.4.0   cachem_1.0.5             
[15] impute_1.66.0             knitr_1.33               
[17] jsonlite_1.7.2            annotate_1.70.0          
[19] cluster_2.1.2             png_0.1-7                
[21] uwot_0.1.10               BiocManager_1.30.15      
[23] compiler_4.1.0            httr_1.4.2               
[25] assertthat_0.2.1          Matrix_1.3-3             
[27] fastmap_1.1.0             lazyeval_0.2.2           
[29] limma_3.48.0              BiocSingular_1.8.0       
[31] htmltools_0.5.1.1         tools_4.1.0              
[33] rsvd_1.0.5                gtable_0.3.0             
[35] glue_1.4.2                GenomeInfoDbData_1.2.6   
[37] Rcpp_1.0.6                jquerylib_0.1.4          
[39] vctrs_0.3.8               Biostrings_2.60.0        
[41] DelayedMatrixStats_1.14.0 xfun_0.23                
[43] stringr_1.4.0             beachmat_2.8.0           
[45] lifecycle_1.0.0           irlba_2.3.3              
[47] XML_3.99-0.6              edgeR_3.34.0             
[49] zlibbioc_1.38.0           MASS_7.3-54              
[51] scales_1.1.1              ProtGenerics_1.24.0      
[53] AnnotationFilter_1.16.0   yaml_2.2.1               
[55] gridExtra_2.3             memoise_2.0.0            
[57] sass_0.4.0                stringi_1.6.2            
[59] RSQLite_2.2.7             highr_0.9                
[61] ScaledMatrix_1.0.0        rlang_0.4.11             
[63] pkgconfig_2.0.3           bitops_1.0-7             
[65] evaluate_0.14             lattice_0.20-44          
[67] purrr_0.3.4               labeling_0.4.2           
[69] cowplot_1.1.1             bit_4.0.4                
[71] tidyselect_1.1.1          bookdown_0.22            
[73] R6_2.5.0                  generics_0.1.0           
[75] DelayedArray_0.18.0       DBI_1.1.1                
[77] pillar_1.6.1              withr_2.4.2              
[79] MsCoreUtils_1.4.0         survival_3.2-11          
[81] KEGGREST_1.32.0           RCurl_1.98-1.3           
[83] tibble_3.1.2              crayon_1.4.1             
[85] utf8_1.2.1                rmarkdown_2.8            
[87] viridis_0.6.1             locfit_1.5-9.4           
[89] grid_4.1.0                FNN_1.1.3                
[91] blob_1.2.1                digest_0.6.27            
[93] xtable_1.8-4              munsell_0.5.0            
[95] viridisLite_0.4.0         beeswarm_0.3.1           
[97] vipor_0.4.5               bslib_0.2.5.1            Amezquita, Robert A, Aaron T L Lun, Etienne Becht, Vince J Carey, Lindsay N Carpp, Ludwig Geistlinger, Federico Martini, et al. 2019. “Orchestrating Single-Cell Analysis with Bioconductor.” Nat. Methods, December, 1–9.
Chen, Albert Tian, Alexander Franks, and Nikolai Slavov. 2019. “DART-ID Increases Single-Cell Proteome Coverage.” PLoS Comput. Biol. 15 (7): e1007082.
Gatto, Laurent. 2020. “QFeatures: Quantitative Features for Mass Spectrometry Data.”
Käll, Lukas, John D Storey, Michael J MacCoss, and William Stafford Noble. 2008. “Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin.” J. Proteome Res. 7 (1): 40–44.
Specht, Harrison, Edward Emmott, Aleksandra A Petelski, R Gray Huffman, David H Perlman, Marco Serra, Peter Kharchenko, Antonius Koller, and Nikolai Slavov. 2021. “Single-Cell Proteomic and Transcriptomic Analysis of Macrophage Heterogeneity Using SCoPE2.” Genome Biol. 22 (1): 50.
Tyanova, Stefka, Tikira Temu, and Juergen Cox. 2016. “The MaxQuant Computational Platform for Mass Spectrometry-Based Shotgun Proteomics.” Nat. Protoc. 11 (12): 2301–19.