Sesame provide convenient function to compare your sample with public data sets processed with the same pipeline. All you need is a raw SigSet.
SeSAMe provides a set of quality control steps. Be sure to pre-cache HM450 annotation data from ExperimentHub. This only needs to be done once per sesame installation.
# library(FlowSorted.Blood.450k)
# ssets <- RGChannelSetToSigSets(FlowSorted.Blood.450k[,1:10])
ssets = sesameDataGet("EPIC.5.normal")The SeSAMe QC function returns an sesameQC object which can be directly printed onto the screen.
## 
## =======================
## =      Intensities    =
## =======================
## No. probes (num_probes_all)        866895 
## mean (M/U) (mean_intensity):       2657.248 
## mean (M+U) (mean_intensity_total): 5314.496 
## 
## -- Infinium II --
## No. probes: (num_probes_II)        724612 (83.587%)
## Mean Intensity (mean_ii):          2526.561 
## 
## -- Infinium I (Red) -- 
## No. probes: (num_probes_IR)        92294 (10.647%)
## No. Probes Consistent Channel:     92045 
## No. Porbes Swapped Channel:        105 
## No. Probes Low Intensity:          144 
## Mean Intensity (in-band):          3579.385 
## Mean Intensity (out-of-band):      524.6618 
## 
## -- Infinium I (Grn) -- 
## No. probes:                     49989 (5.766%)
## No. Probes Consistent Channel:     49276 
## No. Probes Swapped Channel:        630 
## No. Probes Low Intensity:          83 
## Mean Intensity (in-band):          2849.083 
## Mean Intensity (out-of-band):      303.2474 
## 
## =======================
## =    Non-detection    =
## =======================
## No. probes:                        33607 
## No. probes w/ NA (num_na):         33607 (3.877%)
## No. nondetection (num_nondt):      33607 (3.877%)
## 
## =======================
## =      Beta Values    =
## =======================
## Mean Betas:                        0.57629 
## Median Betas:                      0.7269904 
## 
## -- cg probes --
## No. Probes:                        863904 
## No. Probes with NA:                33054 (3.826%)
## Mean Betas:                        0.5777363 
## Median Betas:                      0.7290963 
## % Unmethylated (Beta < 0.3):       30.803%
## % Methylated (Beta > 0.7):         51.886%
## 
## -- ch probes --
## No. Probes:                        2932 
## No. Probes with NA:                550 (18.759%)
## Mean Betas:                        0.07318857 
## Median Betas:                      0.06107732 
## % Unmethylated (Beta < 0.3):       99.874%
## % Methylated (Beta > 0.7):         0.000%
## 
## -- rs probes --
## No. Probes:                        59 
## No. Probes with NA:                3 (5.085%)
## Mean Betas:                        0.518126 
## Median Betas:                      0.5150823 
## % Unmethylated (Beta < 0.3):       30.357%
## % Methylated (Beta > 0.7):         33.929%The sesameQC object can be coerced into data.frame and linked using the following code
qc5 <- do.call(rbind, lapply(ssets, function(x)
    as.data.frame(sesameQC(x))))
qc5$sample_name <- names(ssets)
qc5[,c('mean_beta_cg','frac_meth_cg','frac_unmeth_cg')]The background level is given by mean_oob_grn and mean_oob_red
library(ggplot2)
ggplot(qc5,
    aes(x = mean_oob_grn, y= mean_oob_red, label = sample_name)) +
    geom_point() + geom_text(hjust = -0.1, vjust = 0.1) +
    geom_abline(intercept = 0, slope = 1, linetype = 'dotted') +
    xlab('Green Background') + ylab('Red Background') +
    xlim(c(200,600)) + ylim(c(200,600))The mean {M,U} intensity can be reached by mean_intensity. Similarly, the mean M+U intensity can be reached by mean_intensity_total. Low intensities are symptomatic of low input or poor hybridization.
library(wheatmap)
p1 <- ggplot(qc5) +
    geom_bar(aes(sample_name, mean_intensity), stat='identity') +
    xlab('Sample Name') + ylab('Mean Intensity') +
    ylim(0,18000) +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc5) +
    geom_bar(aes(sample_name, mean_intensity_total), stat='identity') +
    xlab('Sample Name') + ylab('Mean M+U Intensity') +
    ylim(0,18000) +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())The fraction of NAs are signs of masking due to variety of reasons including failed detection, high background, putative low quality probes etc. This number can be reached in frac_na_cg and num_na_cg (the cg stands for CpG probes, so we also have num_na_ch and num_na_rs)
p1 <- ggplot(qc5) +
    geom_bar(aes(sample_name, num_na_cg), stat='identity') +
    xlab('Sample Name') + ylab('Number of NAs') +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
p2 <- ggplot(qc5) +
    geom_bar(aes(sample_name, frac_na_cg), stat='identity') +
    xlab('Sample Name') + ylab('Fraction of NAs (%)') +
    theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))
WGG(p1) + WGG(p2, RightOf())The fraction of color channel switch can be found in InfI_switch_G2R and InfI_switch_R2G. These numbers are symptomatic of how Infinium I probes are affected by SNP-induced color channel switching.
Infinium platforms are intrinsically robust to incomplete bisulfite conversion as non-converted probes would fail to hybridize to the target. Residual incomplete bisulfite conversion can be quantified using GCT score based on C/T-extension probes. Details of this method can be found in Zhou et al. 2017. The closer the score to 1.0, the more complete the bisulfite conversion.
## [1] 1.067894SeSAMe can output explicit and Infinium-I-derived SNP to VCF. This information can be used to identify sample swaps.
## Retrieving annotation from  https://zhouserver.research.chop.edu/InfiniumAnnotation/current/EPIC/EPIC.hg19.snp_overlap_b151.rds ... Done.## Retrieving annotation from  https://zhouserver.research.chop.edu/InfiniumAnnotation/current/EPIC/EPIC.hg19.typeI_overlap_b151.rds ... Done.One can output to actual VCF file with a header by formatVCF(sset, vcf=path_to_vcf).