library(methylSig)
The purpose of this vignette is to show users how to retrofit their methylSig
< 0.99.0 code to work with the refactor in version 0.99.0 and later.
In versions < 0.99.0 of methylSig
, the methylSigReadData()
function read Bismark coverage files, Bismark genome-wide CpG reports, or MethylDackel bedGraphs. Additionally, users could destrand the data, filter by coverage, and filter SNPs.
meth = methylSigReadData(
fileList = files,
pData = pData,
assembly = 'hg19',
destranded = TRUE,
maxCount = 500,
minCount = 10,
filterSNPs = TRUE,
num.cores = 1,
fileType = 'cytosineReport')
In versions >= 0.99.0 of methylSig
, the user should read data with bsseq::read.bismark()
and then apply functions that were once bundled within methylSigReadData()
.
files = c(
system.file('extdata', 'bis_cov1.cov', package='methylSig'),
system.file('extdata', 'bis_cov2.cov', package='methylSig')
)
bsseq_stranded = bsseq::read.bismark(
files = files,
colData = data.frame(row.names = c('test1','test2')),
rmZeroCov = FALSE,
strandCollapse = FALSE
)
After reading data, filter by coverage. Note, we are changing our dataset to something we can use with the downstream functions.
# Load data for use in the rest of the vignette
data(BS.cancer.ex, package = 'bsseqData')
bs = BS.cancer.ex[1:10000]
bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500)
If the locations of C-to-T and G-to-A SNPs are known, or some other set of location should be removed:
# Construct GRanges object
remove_gr = GenomicRanges::GRanges(
seqnames = c('chr21', 'chr21', 'chr21'),
ranges = IRanges::IRanges(
start = c(9411552, 9411784, 9412099),
end = c(9411552, 9411784, 9412099)
)
)
bs = filter_loci_by_location(bs = bs, gr = remove_gr)
In versions < 0.99.0 of methylSig
, the methylSigTile()
function combined aggregating CpG data over pre-defined tiles and genomic windows.
# For genomic windows, tiles = NULL
windowed_meth = methylSigTile(meth, tiles = NULL, win.size = 10000)
# For pre-defined tiles, tiles should be a GRanges object.
In versions >= 0.99.0 of methylSig
, tiling is separated into two functions, tile_by_regions()
and tile_by_windows()
. Users should chooose one or the other.
windowed_bs = tile_by_windows(bs = bs, win_size = 10000)
# Collapsed promoters on chr21 and chr22
data(promoters_gr, package = 'methylSig')
promoters_bs = tile_by_regions(bs = bs, gr = promoters_gr)
In versions < 0.99.0 of methylSig
, the methylSigCalc
function had a min.per.group
parameter to determine how many samples per group had to have coverage in order to be tested.
result = methylSigCalc(
meth = meth,
comparison = 'DR_vs_DS',
dispersion = 'both',
local.info = FALSE,
local.winsize = 200,
min.per.group = c(3,3),
weightFunc = methylSig_weightFunc,
T.approx = TRUE,
num.cores = 1)
In versions >= 0.99.0 of methylSig
, the min.per.group
functionality is performed by a separate function filter_loci_by_group_coverage()
. Also note the change in form to define dispersion calculations, and the use of local information.
# Look a the phenotype data for bs
bsseq::pData(bs)
#> DataFrame with 6 rows and 2 columns
#> Type Pair
#> <character> <character>
#> C1 cancer pair1
#> C2 cancer pair2
#> C3 cancer pair3
#> N1 normal pair1
#> N2 normal pair2
#> N3 normal pair3
# Require at least two samples from cancer and two samples from normal
bs = filter_loci_by_group_coverage(
bs = bs,
group_column = 'Type',
c('cancer' = 2, 'normal' = 2))
After removing loci with insufficient information, we can now use the diff_methylsig()
test.
# Test cancer versus normal with dispersion from both groups
diff_gr = diff_methylsig(
bs = bs,
group_column = 'Type',
comparison_groups = c('case' = 'cancer', 'control' = 'normal'),
disp_groups = c('case' = TRUE, 'control' = TRUE),
local_window_size = 0,
t_approx = TRUE,
n_cores = 1)
In versions < 0.99.0 of methylSig
, the methylSigDSS
function also had a min.per.group
parameter to determine how many samples per group had to have coverage. Users also couldn’t specify which methylation groups to recover. The form of design
, formula
, and contrast
, remain the same in versions >= 0.99.0.
contrast = matrix(c(0,1), ncol = 1)
result_dss = methylSigDSS(
meth = meth,
design = design1,
formula = '~ group',
contrast = contrast,
group.term = 'group',
min.per.group=c(3,3))
In versions >= 0.99.0 of methylSig
, the single methylSigDSS()
function is replaced by a fit function diff_dss_fit()
and a test functiotn diff_dss_test()
. As with diff_methylsig()
, users should ensure enough samples have sufficient coverage with the filter_loci_by_group_coverage()
function. The design
and formula
are unchanged in their forms.
If a continuous covariate is to be tested, filter_loci_by_group_coverage()
should be skipped, as there are no groups. In prior versions of methylSigDSS()
, this was not possible, and the group constraints were incorrectly applied prior to testing on a continuous covariate.
# IF NOT DONE PREVIOUSLY
# Require at least two samples from cancer and two samples from normal
bs = filter_loci_by_group_coverage(
bs = bs,
group_column = 'Type',
c('cancer' = 2, 'normal' = 2))
# Test the simplest model with an intercept and Type
diff_fit_simple = diff_dss_fit(
bs = bs,
design = bsseq::pData(bs),
formula = as.formula('~ Type'))
#> Fitting DML model for CpG site:
The contrast
parameter is also changed in its form. Note the, additional parameters to specify how to recover group methylation. methylation_group_column
and methylation_groups
should be specified for group versus group comparisons. For continuous covariates, methylation_group_column
is sufficient, and the samples will be grouped into top/bottom 25 percentile based on the continuous covariate column name given in methylation_group_column
.
# Test the simplest model for cancer vs normal
# Note, 2 rows corresponds to 2 columns in diff_fit_simple$X
simple_contrast = matrix(c(0,1), ncol = 1)
diff_simple_gr = diff_dss_test(
bs = bs,
diff_fit = diff_fit_simple,
contrast = simple_contrast,
methylation_group_column = 'Type',
methylation_groups = c('case' = 'cancer', 'control' = 'normal'))
sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] methylSig_1.12.0 BiocStyle_2.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.30.0 rjson_0.2.21
#> [3] xfun_0.39 bslib_0.4.2
#> [5] rhdf5_2.44.0 Biobase_2.60.0
#> [7] lattice_0.21-8 rhdf5filters_1.12.0
#> [9] tools_4.3.0 bitops_1.0-7
#> [11] stats4_4.3.0 parallel_4.3.0
#> [13] R.oo_1.25.0 Matrix_1.5-4
#> [15] data.table_1.14.8 BSgenome_1.68.0
#> [17] S4Vectors_0.38.0 sparseMatrixStats_1.12.0
#> [19] lifecycle_1.0.3 GenomeInfoDbData_1.2.10
#> [21] compiler_4.3.0 Rsamtools_2.16.0
#> [23] Biostrings_2.68.0 munsell_0.5.0
#> [25] codetools_0.2-19 permute_0.9-7
#> [27] GenomeInfoDb_1.36.0 htmltools_0.5.5
#> [29] sass_0.4.5 RCurl_1.98-1.12
#> [31] yaml_2.3.7 crayon_1.5.2
#> [33] jquerylib_0.1.4 R.utils_2.12.2
#> [35] BiocParallel_1.34.0 DelayedArray_0.26.0
#> [37] cachem_1.0.7 limma_3.56.0
#> [39] gtools_3.9.4 locfit_1.5-9.7
#> [41] digest_0.6.31 restfulr_0.0.15
#> [43] bookdown_0.33 splines_4.3.0
#> [45] fastmap_1.1.1 grid_4.3.0
#> [47] colorspace_2.1-0 cli_3.6.1
#> [49] DSS_2.48.0 XML_3.99-0.14
#> [51] DelayedMatrixStats_1.22.0 scales_1.2.1
#> [53] rmarkdown_2.21 XVector_0.40.0
#> [55] matrixStats_0.63.0 R.methodsS3_1.8.2
#> [57] beachmat_2.16.0 HDF5Array_1.28.0
#> [59] evaluate_0.20 knitr_1.42
#> [61] BiocIO_1.10.0 GenomicRanges_1.52.0
#> [63] IRanges_2.34.0 rtracklayer_1.60.0
#> [65] rlang_1.1.0 Rcpp_1.0.10
#> [67] BiocManager_1.30.20 BiocGenerics_0.46.0
#> [69] bsseq_1.36.0 jsonlite_1.8.4
#> [71] R6_2.5.1 Rhdf5lib_1.22.0
#> [73] GenomicAlignments_1.36.0 MatrixGenerics_1.12.0
#> [75] zlibbioc_1.46.0