How to use DeeDeeExperiment with single-cell data
DeeDeeExperiment 1.1.2
muscat & DeeDeeExperimentDeeDeeExperiment on the Kang datasetIn this vignette, we illustrate how to integrate Differential Expression Analysis (DEA) results generated with the muscat framework into a DeeDeeExperiment object. As an example, we use a publicly available dataset from Kang, et al. “Multiplexed droplet single-cell RNA-sequencing using natural genetic variation”, published in Nature Biotechnology, December 2017 (Kang et al. 2017)(https://doi.org/10.1038/nbt.4042)
The data is made available via the ExperimentHub Bioconductor package as a SingleCellExperiment object containing scRNA-seq data from PBMCs obtained from 8 lupus patients before and after IFNβ stimulation.
For demonstration purposes we adapt parts of the code from the original muscat vignette)
library("DeeDeeExperiment")
library("ExperimentHub")
library("scater")
library("muscat")
library("limma")
We begin by creating an ExperimentHub instance, which provides access to
curated datasets stored in the Bioconductor cloud. Using query(), we filter
available records for entries matching the keyword “Kang”, and then load the
dataset of interest using its accession ID “EH2259”.
# retrieve the data
eh <- ExperimentHub()
query(eh, "Kang")
#> ExperimentHub with 3 records
#> # snapshotDate(): 2025-12-03
#> # $dataprovider: NCI_GDC, GEO
#> # $species: Homo sapiens
#> # $rdataclass: character, SingleCellExperiment, BSseq
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH1661"]]'
#>
#> title
#> EH1661 | Whole Genome Bisulfit Sequencing Data for 47 samples
#> EH1662 | Whole Genome Bisulfit Sequencing Data for 47 samples
#> EH2259 | Kang18_8vs8
sce <- eh[["EH2259"]]
Before running muscat, we perform standard single-cell preprocessing steps: removing undetected genes, filtering low-quality cells using scater, and removing lowly expressed genes.
# remove undetected genes
sce <- sce[rowSums(counts(sce) > 0) > 0, ]
# calculate per-cell quality control (QC) metrics
qc <- perCellQCMetrics(sce)
# remove cells with few or many detected genes
ol <- isOutlier(metric = qc$detected, nmads = 2, log = TRUE)
sce <- sce[, !ol]
dim(sce)
#> [1] 18890 26820
# remove lowly expressed genes
sce <- sce[rowSums(counts(sce) > 1) >= 10, ]
dim(sce)
#> [1] 7118 26820
# compute sum-factors & normalize
sce <- computeLibraryFactors(sce)
sce <- logNormCounts(sce)
We then prepare the data for muscat. The package expects a certain format of
the input SCE. Specifically, the following cell metadata (colData) columns have
to be provided:
sample_id : unique sample identifierscluster_id : subpopulation (cluster) assignmentsgroup_id : experimental group/condition# data preparation
sce$id <- paste0(sce$stim, sce$ind)
(sce <- prepSCE(sce,
kid = "cell", # subpopulation assignments
gid = "stim", # group IDs (ctrl/stim)
sid = "id", # sample IDs (ctrl/stim.1234)
drop = TRUE)) # drop all other colData columns
#> class: SingleCellExperiment
#> dim: 7118 26820
#> metadata(1): experiment_info
#> assays(2): counts logcounts
#> rownames(7118): NOC2L HES4 ... S100B PRMT2
#> rowData names(2): ENSEMBL SYMBOL
#> colnames(26820): AAACATACAATGCC-1 AAACATACATTTCC-1 ... TTTGCATGGTTTGG-1
#> TTTGCATGTCTTAC-1
#> colData names(3): cluster_id sample_id group_id
#> reducedDimNames(1): TSNE
#> mainExpName: NULL
#> altExpNames(0):
We compute UMAP for visualization. The dataset already includes precomputed TSNE coordinates, so we only run UMAP here.
# compute UMAP using 1st 20 PCs
sce <- runUMAP(sce, pca = 20)
Then we aggregate measurements for each sample (in each cluster) to obtain pseudobulk data
# aggregate by cell type
pb <- aggregateData(
sce,
assay = "counts",
fun = "sum",
by = c("cluster_id", "sample_id")
)
assayNames(pb)
#> [1] "B cells" "CD14+ Monocytes" "CD4 T cells"
#> [4] "CD8 T cells" "Dendritic cells" "FCGR3A+ Monocytes"
#> [7] "Megakaryocytes" "NK cells"
And construct the contrast matrix
# construct design & contrast matrix
ei <- metadata(sce)$experiment_info
mm <- model.matrix(~ 0 + ei$group_id)
dimnames(mm) <- list(ei$sample_id, levels(ei$group_id))
contrast <- makeContrasts("stim-ctrl", levels = mm)
With the pseudobulk data assembled, we can now test for differential state (DS)
using pbDS
# run DS analysis
muscat_res <- pbDS(pb, design = mm, contrast = contrast)
#>
|
| | 0%
|
|========= | 12%
|
|================== | 25%
|
|========================== | 38%
|
|=================================== | 50%
|
|============================================ | 62%
|
|==================================================== | 75%
|
|============================================================= | 88%
|
|======================================================================| 100%
names(muscat_res$table[["stim-ctrl"]])
#> [1] "B cells" "CD14+ Monocytes" "CD4 T cells"
#> [4] "CD8 T cells" "Dendritic cells" "FCGR3A+ Monocytes"
#> [7] "Megakaryocytes" "NK cells"
Now we integrate the output of muscat in muscat_list_for_dde() to transform it
into a format accepted by DeeDeeExperiment
# preparing the results as muscat list
muscat_list <- muscat_list_for_dde(res = list(`stim-ctrl` = muscat_res),
padj_col = "p_adj.loc")
Finally the results can be directly added to a new dde object, or to an
existing one.
# create dde
dde <- DeeDeeExperiment(sce = sce,
de_results = muscat_list)
dde
#> class: DeeDeeExperiment
#> dim: 7118 26820
#> metadata(3): experiment_info singlecontrast version
#> assays(2): counts logcounts
#> rownames(7118): NOC2L HES4 ... S100B PRMT2
#> rowData names(26): ENSEMBL SYMBOL ... stim-ctrl_NK cells_pvalue
#> stim-ctrl_NK cells_padj
#> colnames(26820): AAACATACAATGCC-1 AAACATACATTTCC-1 ... TTTGCATGGTTTGG-1
#> TTTGCATGTCTTAC-1
#> colData names(3): cluster_id sample_id group_id
#> reducedDimNames(2): TSNE UMAP
#> mainExpName: NULL
#> altExpNames(0):
#> dea(8): stim-ctrl_B cells, stim-ctrl_CD14+ Monocytes, stim-ctrl_CD4 T cells, stim-ctrl_CD8 T cells, stim-ctrl_Dendritic cells, stim-ctrl_FCGR3A+ Monocytes, stim-ctrl_Megakaryocytes, stim-ctrl_NK cells
#> fea(0):
As for the other DeeDeeExperiment objects, we can call some specific methods to extract/retrieve/integrate some information.
We can extract the names of the DEA included:
# check DEAs
getDEANames(dde)
#> [1] "stim-ctrl_B cells" "stim-ctrl_CD14+ Monocytes"
#> [3] "stim-ctrl_CD4 T cells" "stim-ctrl_CD8 T cells"
#> [5] "stim-ctrl_Dendritic cells" "stim-ctrl_FCGR3A+ Monocytes"
#> [7] "stim-ctrl_Megakaryocytes" "stim-ctrl_NK cells"
Also, we can directly retrieve the content itself of each DEA by typing
# retrieve results
getDEA(dde,
dea_name = "stim-ctrl_NK cells")|> head()
#> DataFrame with 6 rows and 3 columns
#> stim-ctrl_NK cells_log2FoldChange stim-ctrl_NK cells_pvalue
#> <numeric> <numeric>
#> NOC2L NA NA
#> HES4 NA NA
#> ISG15 4.714773 6.66734e-20
#> TNFRSF18 -0.583992 6.47700e-02
#> TNFRSF4 -0.220578 4.98028e-01
#> SDF4 -0.524322 1.77759e-02
#> stim-ctrl_NK cells_padj
#> <numeric>
#> NOC2L NA
#> HES4 NA
#> ISG15 2.88696e-17
#> TNFRSF18 1.63055e-01
#> TNFRSF4 6.65061e-01
#> SDF4 6.29608e-02
getDEA(dde,
dea_name = "stim-ctrl_CD14+ Monocytes",
format = "original") |> head()
#> gene cluster_id log2FoldChange logCPM F
#> HES4 HES4 CD14+ Monocytes 6.4921975 7.975074 305.569411
#> ISG15 ISG15 CD14+ Monocytes 7.0240366 14.751515 232.635530
#> SDF4 SDF4 CD14+ Monocytes -0.7242092 5.446124 15.386517
#> UBE2J2 UBE2J2 CD14+ Monocytes -0.8153555 5.351035 23.424243
#> CPSF3L CPSF3L CD14+ Monocytes -0.6688228 4.337862 7.582583
#> AURKAIP1 AURKAIP1 CD14+ Monocytes -0.5741367 7.389181 34.562176
#> pvalue padj p_adj.glb contrast
#> HES4 2.883448e-14 1.398472e-12 1.686302e-12 stim-ctrl
#> ISG15 6.200271e-13 1.606550e-11 2.344791e-11 stim-ctrl
#> SDF4 7.676618e-04 1.818450e-03 3.340543e-03 stim-ctrl
#> UBE2J2 8.502381e-05 2.510891e-04 4.817526e-04 stim-ctrl
#> CPSF3L 1.187149e-02 2.083016e-02 3.426077e-02 stim-ctrl
#> AURKAIP1 7.378159e-06 2.880451e-05 5.506716e-05 stim-ctrl
General info on the DEA performed can be shown with getDEAInfo()
# retrieving the DEA information
dea_name <- "stim-ctrl_B cells"
getDEAInfo(dde)[[dea_name]][["package"]]
#> [1] "muscat"
getDEAInfo(dde)[[dea_name]][["package_version"]]
#> [1] "1.25.0"
To add some information on the scenario under investigation, we can use addScenarioInfo().
This can be e.g. later processed as a contextually relevant bit if a Large Language Model is used to interact with this object.
# adding info on the scenario under investigation
dde <- addScenarioInfo(dde,
dea_name = "stim-ctrl_Dendritic cells",
info = "This result contains the output of pseudobulk DE analysis performed on dendritic cells, comparing untreated samples to those stimulated with IFNβ")
As usual, the summary() method can be called to obtain a quick overview on all performed analyses.
summary(dde, show_scenario_info = TRUE)
#> DE Results Summary:
#> DEA_name Up Down FDR
#> stim-ctrl_B cells 345 271 0.05
#> stim-ctrl_CD14+ Monocytes 1121 1289 0.05
#> stim-ctrl_CD4 T cells 642 516 0.05
#> stim-ctrl_CD8 T cells 178 115 0.05
#> stim-ctrl_Dendritic cells 113 70 0.05
#> stim-ctrl_FCGR3A+ Monocytes 498 413 0.05
#> stim-ctrl_Megakaryocytes 29 2 0.05
#> stim-ctrl_NK cells 254 201 0.05
#>
#> No FEA results stored.
#>
#> Scenario Info:
#> - stim-ctrl_Dendritic cells :
#> This result contains the output of pseudobulk DE analysis performed on
#> dendritic cells, comparing untreated samples to those stimulated with IFNβ
#>
#>
#> No scenario info for: stim-ctrl_B cells, stim-ctrl_CD14+ Monocytes, stim-ctrl_CD4 T cells, stim-ctrl_CD8 T cells, stim-ctrl_FCGR3A+ Monocytes, stim-ctrl_Megakaryocytes, stim-ctrl_NK cells
sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] muscData_1.25.0 muscat_1.25.0
#> [3] scater_1.39.0 ggplot2_4.0.1
#> [5] scuttle_1.21.0 ExperimentHub_3.1.0
#> [7] AnnotationHub_4.1.0 BiocFileCache_3.1.0
#> [9] dbplyr_2.5.1 DEFormats_1.39.0
#> [11] edgeR_4.9.0 limma_3.67.0
#> [13] DESeq2_1.51.6 macrophage_1.27.0
#> [15] DeeDeeExperiment_1.1.2 SingleCellExperiment_1.33.0
#> [17] SummarizedExperiment_1.41.0 Biobase_2.71.0
#> [19] GenomicRanges_1.63.0 Seqinfo_1.1.0
#> [21] IRanges_2.45.0 S4Vectors_0.49.0
#> [23] BiocGenerics_0.57.0 generics_0.1.4
#> [25] MatrixGenerics_1.23.0 matrixStats_1.5.0
#> [27] BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.6.6 bitops_1.0-9 httr_1.4.7
#> [4] RColorBrewer_1.1-3 doParallel_1.0.17 numDeriv_2016.8-1.1
#> [7] tools_4.6.0 sctransform_0.4.2 backports_1.5.0
#> [10] R6_2.6.1 uwot_0.2.4 mgcv_1.9-4
#> [13] apeglm_1.33.0 GetoptLong_1.1.0 withr_3.0.2
#> [16] prettyunits_1.2.0 gridExtra_2.3 fdrtool_1.2.18
#> [19] cli_3.6.5 sandwich_3.1-1 slam_0.1-55
#> [22] sass_0.4.10 mvtnorm_1.3-3 S7_0.2.1
#> [25] blme_1.0-6 yulab.utils_0.2.2 DOSE_4.5.0
#> [28] R.utils_2.13.0 dichromat_2.0-0.1 parallelly_1.45.1
#> [31] bbmle_1.0.25.1 RSQLite_2.4.5 shape_1.4.6.1
#> [34] gtools_3.9.5 dplyr_1.1.4 GO.db_3.22.0
#> [37] Matrix_1.7-4 ggbeeswarm_0.7.3 abind_1.4-8
#> [40] R.methodsS3_1.8.2 lifecycle_1.0.4 multcomp_1.4-29
#> [43] yaml_2.3.11 gplots_3.3.0 qvalue_2.43.0
#> [46] SparseArray_1.11.8 grid_4.6.0 blob_1.2.4
#> [49] crayon_1.5.3 bdsmatrix_1.3-7 lattice_0.22-7
#> [52] beachmat_2.27.0 cowplot_1.2.0 KEGGREST_1.51.1
#> [55] pillar_1.11.1 knitr_1.50 ComplexHeatmap_2.27.0
#> [58] fgsea_1.37.2 rjson_0.2.23 boot_1.3-32
#> [61] estimability_1.5.1 corpcor_1.6.10 future.apply_1.20.0
#> [64] codetools_0.2-20 fastmatch_1.1-6 glue_1.8.0
#> [67] data.table_1.17.8 vctrs_0.6.5 png_0.1-8
#> [70] Rdpack_2.6.4 gtable_0.3.6 emdbook_1.3.14
#> [73] cachem_1.1.0 xfun_0.54 rbibutils_2.4
#> [76] S4Arrays_1.11.1 coda_0.19-4.1 reformulas_0.4.2
#> [79] survival_3.8-3 iterators_1.0.14 statmod_1.5.1
#> [82] TH.data_1.1-5 nlme_3.1-168 pbkrtest_0.5.5
#> [85] bit64_4.6.0-1 RcppAnnoy_0.0.22 progress_1.2.3
#> [88] EnvStats_3.1.0 filelock_1.0.3 bslib_0.9.0
#> [91] TMB_1.9.18 irlba_2.3.5.1 vipor_0.4.7
#> [94] KernSmooth_2.23-26 colorspace_2.1-2 DBI_1.2.3
#> [97] tidyselect_1.2.1 emmeans_2.0.0 bit_4.6.0
#> [100] compiler_4.6.0 curl_7.0.0 httr2_1.2.1
#> [103] BiocNeighbors_2.5.0 DelayedArray_0.37.0 bookdown_0.46
#> [106] checkmate_2.3.3 scales_1.4.0 caTools_1.18.3
#> [109] remaCor_0.0.20 rappdirs_0.3.3 stringr_1.6.0
#> [112] digest_0.6.39 minqa_1.2.8 variancePartition_1.41.2
#> [115] rmarkdown_2.30 aod_1.3.3 XVector_0.51.0
#> [118] RhpcBLASctl_0.23-42 htmltools_0.5.9 pkgconfig_2.0.3
#> [121] lme4_1.1-38 lpsymphony_1.39.0 fastmap_1.2.0
#> [124] rlang_1.1.6 GlobalOptions_0.1.3 farver_2.1.2
#> [127] jquerylib_0.1.4 IHW_1.39.0 zoo_1.8-14
#> [130] jsonlite_2.0.0 BiocParallel_1.45.0 GOSemSim_2.37.0
#> [133] R.oo_1.27.1 BiocSingular_1.27.1 magrittr_2.0.4
#> [136] Rcpp_1.1.0 viridis_0.6.5 stringi_1.8.7
#> [139] MASS_7.3-65 plyr_1.8.9 parallel_4.6.0
#> [142] listenv_0.10.0 ggrepel_0.9.6 Biostrings_2.79.2
#> [145] splines_4.6.0 hms_1.1.4 circlize_0.4.16
#> [148] locfit_1.5-9.12 reshape2_1.4.5 ScaledMatrix_1.19.0
#> [151] BiocVersion_3.23.1 evaluate_1.0.5 BiocManager_1.30.27
#> [154] nloptr_2.2.1 foreach_1.5.2 tidyr_1.3.1
#> [157] purrr_1.2.0 future_1.68.0 clue_0.3-66
#> [160] rsvd_1.0.5 broom_1.0.11 xtable_1.8-4
#> [163] RSpectra_0.16-2 fANCOVA_0.6-1 viridisLite_0.4.2
#> [166] tibble_3.3.0 lmerTest_3.1-3 glmmTMB_1.1.13
#> [169] memoise_2.0.1 beeswarm_0.4.0 AnnotationDbi_1.73.0
#> [172] cluster_2.1.8.1 globals_0.18.0
Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. 2017. “Multiplexed Droplet Single-Cell Rna-Sequencing Using Natural Genetic Variation.” Nature Biotechnology 36 (1): 89–94. https://doi.org/10.1038/nbt.4042.