The main function to calculate the quality metrics is
sesameQC_calcStats
. This function takes a SigDF, calculates
the QC statistics, and returns a single S4 sesameQC
object,
which can be printed directly to the console. To calculate QC metrics on
a given list of samples or all IDATs in a folder, one can use
sesameQC_calcStats
within the standard
openSesame
pipeline. When used with
openSesame
, a list of sesameQC
s will be
returned. Note that one should turn off preprocessing using
prep=""
:
## calculate metrics on all IDATs in a specific folder
sesameQCtoDF(openSesame(idat_dir, prep="", func=sesameQC_calcStats))
## or a list of prefixes, with parallel processing
sesameQCtoDF(openSesame(sprintf("%s/%s", idat_dir, idat_prefixes), prep="",
func=sesameQC_calcStats, BPPARAM=BiocParallel::MulticoreParam(24)))
The results display frac_dt_cg
, RGratio
,
RGdistort
by default. For other QC metrics, SeSAMe divides
sample quality metrics into multiple groups. These groups are listed
below and can be referred to by short keys. For example, “intensity”
generates signal intensity-related quality metrics.
Short.Key | Description |
---|---|
detection | Signal Detection |
numProbes | Number of Probes |
intensity | Signal Intensity |
channel | Color Channel |
dyeBias | Dye Bias |
betas | Beta Value |
By default, sesameQC_calcStats
calculates all QC groups.
To save time, one can compute a specific QC group by specifying one or
multiple short keys in the funs=
argument:
sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] # get two examples
## only compute signal detection stats
qcs = openSesame(sdfs, prep="", func=sesameQC_calcStats, funs="detection")
qcs[[1]]
##
## =====================
## | Detection
## =====================
## N. Probes w/ Missing Raw Intensity : 0 (num_dtna)
## % Probes w/ Missing Raw Intensity : 0.0 % (frac_dtna)
## N. Probes w/ Detection Success : 838020 (num_dt)
## % Detection Success : 96.7 % (frac_dt)
## N. Detection Succ. (after masking) : 838020 (num_dt_mk)
## % Detection Succ. (after masking) : 96.7 % (frac_dt_mk)
## N. Probes w/ Detection Success (cg) : 835491 (num_dt_cg)
## % Detection Success (cg) : 96.7 % (frac_dt_cg)
## N. Probes w/ Detection Success (ch) : 2471 (num_dt_ch)
## % Detection Success (ch) : 84.3 % (frac_dt_ch)
## N. Probes w/ Detection Success (rs) : 58 (num_dt_rs)
## % Detection Success (rs) : 98.3 % (frac_dt_rs)
We consider signal detection the most important QC metric.
One can retrieve the actual stat numbers from sesameQC
using the sesameQC_getStats (the following generates the fraction of
probes with detection success):
## [1] 0.9666915
After computing the QCs, one can optionally combine the
sesameQC
objects into a data frame for easy comparison.
Note that when the input is an SigDF
object, calling
sesameQC_calcStats
within openSesame
and as a
standalone function are equivalent.
sdf <- sesameDataGet('EPIC.1.SigDF')
qc = openSesame(sdf, prep="", func=sesameQC_calcStats, funs=c("detection"))
## equivalent direct call
qc = sesameQC_calcStats(sdf, c("detection"))
qc
##
## =====================
## | Detection
## =====================
## N. Probes w/ Missing Raw Intensity : 0 (num_dtna)
## % Probes w/ Missing Raw Intensity : 0.0 % (frac_dtna)
## N. Probes w/ Detection Success : 834922 (num_dt)
## % Detection Success : 96.3 % (frac_dt)
## N. Detection Succ. (after masking) : 834922 (num_dt_mk)
## % Detection Succ. (after masking) : 96.3 % (frac_dt_mk)
## N. Probes w/ Detection Success (cg) : 832046 (num_dt_cg)
## % Detection Success (cg) : 96.4 % (frac_dt_cg)
## N. Probes w/ Detection Success (ch) : 2616 (num_dt_ch)
## % Detection Success (ch) : 89.2 % (frac_dt_ch)
## N. Probes w/ Detection Success (rs) : 58 (num_dt_rs)
## % Detection Success (rs) : 98.3 % (frac_dt_rs)
SeSAMe features comparison of your sample with public data sets. The
sesameQC_rankStats()
function ranks the input
sesameQC
object with sesameQC
calculated from
public datasets. It shows the rank percentage of the input sample as
well as the number of datasets compared.
##
## =====================
## | Signal Intensity
## =====================
## Mean sig. intensity : 3171.21 (mean_intensity)
## Mean sig. intensity (M+U) : 6342.41 (mean_intensity_MU)
## Mean sig. intensity (Inf.II) : 2991.85 (mean_ii)
## Mean sig. intens.(I.Grn IB) : 3004.33 (mean_inb_grn)
## Mean sig. intens.(I.Red IB) : 4670.97 (mean_inb_red)
## Mean sig. intens.(I.Grn OOB) : 318.55 (mean_oob_grn)
## Mean sig. intens.(I.Red OOB) : 606.99 (mean_oob_red)
## N. NA in M (all probes) : 0 (na_intensity_M)
## N. NA in U (all probes) : 0 (na_intensity_U)
## N. NA in raw intensity (IG) : 0 (na_intensity_ig)
## N. NA in raw intensity (IR) : 0 (na_intensity_ir)
## N. NA in raw intensity (II) : 0 (na_intensity_ii)
##
## =====================
## | Signal Intensity
## =====================
## Mean sig. intensity : 3171.21 (mean_intensity) - Rank 15.7% (N=636)
## Mean sig. intensity (M+U) : 6342.41 (mean_intensity_MU)
## Mean sig. intensity (Inf.II) : 2991.85 (mean_ii) - Rank 15.6% (N=636)
## Mean sig. intens.(I.Grn IB) : 3004.33 (mean_inb_grn) - Rank 7.5% (N=636)
## Mean sig. intens.(I.Red IB) : 4670.97 (mean_inb_red) - Rank 21.2% (N=636)
## Mean sig. intens.(I.Grn OOB) : 318.55 (mean_oob_grn) - Rank 4.2% (N=636)
## Mean sig. intens.(I.Red OOB) : 606.99 (mean_oob_red) - Rank 3.6% (N=636)
## N. NA in M (all probes) : 0 (na_intensity_M)
## N. NA in U (all probes) : 0 (na_intensity_U)
## N. NA in raw intensity (IG) : 0 (na_intensity_ig)
## N. NA in raw intensity (IR) : 0 (na_intensity_ir)
## N. NA in raw intensity (II) : 0 (na_intensity_ii)
SeSAMe provides functions to create QC plots. Some functions takes sesameQC as input while others directly plot the SigDF objects. Here are some examples:
sesameQC_plotBar()
takes a list of sesameQC objects
and creates bar plot for each metric calculated.
sesameQC_plotRedGrnQQ()
graphs the dye bias between
the two color channels.
sesameQC_plotIntensVsBetas()
plots the relationship
between β values and signal
intensity and can be used to diagnose artificial readout and influence
of signal background.
sesameQC_plotHeatSNPs()
plots SNP probes and can be
used to detect sample swaps.
More about quality control plots can be found in Supplemental Vignette.
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.49 sesame_1.25.3 sesameData_1.25.0
## [4] ExperimentHub_2.15.0 AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [7] dbplyr_2.5.0 BiocGenerics_0.53.3 generics_0.1.3
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.75.3 fastmap_1.2.0
## [7] digest_0.6.37 lifecycle_1.0.4
## [9] KEGGREST_1.47.0 RSQLite_2.3.9
## [11] magrittr_2.0.3 compiler_4.5.0
## [13] rlang_1.1.4 sass_0.4.9
## [15] tools_4.5.0 yaml_2.3.10
## [17] S4Arrays_1.7.1 bit_4.5.0.1
## [19] curl_6.1.0 DelayedArray_0.33.3
## [21] plyr_1.8.9 RColorBrewer_1.1-3
## [23] abind_1.4-8 BiocParallel_1.41.0
## [25] withr_3.0.2 purrr_1.0.2
## [27] grid_4.5.0 stats4_4.5.0
## [29] preprocessCore_1.69.0 wheatmap_0.2.0
## [31] colorspace_2.1-1 ggplot2_3.5.1
## [33] scales_1.3.0 SummarizedExperiment_1.37.0
## [35] cli_3.6.3 rmarkdown_2.29
## [37] crayon_1.5.3 reshape2_1.4.4
## [39] httr_1.4.7 tzdb_0.4.0
## [41] DBI_1.2.3 cachem_1.1.0
## [43] stringr_1.5.1 parallel_4.5.0
## [45] AnnotationDbi_1.69.0 BiocManager_1.30.25
## [47] XVector_0.47.2 matrixStats_1.5.0
## [49] vctrs_0.6.5 Matrix_1.7-1
## [51] jsonlite_1.8.9 IRanges_2.41.2
## [53] hms_1.1.3 S4Vectors_0.45.2
## [55] bit64_4.5.2 jquerylib_0.1.4
## [57] glue_1.8.0 codetools_0.2-20
## [59] stringi_1.8.4 gtable_0.3.6
## [61] BiocVersion_3.21.1 GenomeInfoDb_1.43.2
## [63] GenomicRanges_1.59.1 UCSC.utils_1.3.0
## [65] munsell_0.5.1 tibble_3.2.1
## [67] pillar_1.10.1 rappdirs_0.3.3
## [69] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13
## [71] R6_2.5.1 evaluate_1.0.1
## [73] Biobase_2.67.0 lattice_0.22-6
## [75] readr_2.1.5 png_0.1-8
## [77] memoise_2.0.1 BiocStyle_2.35.0
## [79] bslib_0.8.0 Rcpp_1.0.13-1
## [81] SparseArray_1.7.2 xfun_0.50
## [83] MatrixGenerics_1.19.0 pkgconfig_2.0.3