---
title: "Identifying DMCs using Bayesian functional regressions in BS-Seq data"
author:
- name: Farhad Shokoohi
  affiliation: Department of Mathematical Sciences, 
    University of Nevada, Las Vegas
  email: shokoohi@icloud.com 
date: "`r doc_date()`"
package: DMCFB
abstract: "`Instructions on using the DMCFB package.`"
vignette: >
  %\VignetteIndexEntry{Identifying DMCs using Bayesian functional regressions in BS-Seq data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document  
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The `DMCFB` package is a pipeline to identify differentially methylated 
cytosine (DMC) in bisulfite sequencing data using Bayesian functional
regression models. 
In what follows we provides some guidelines on how to read your data and analyze
them.

# Reading data
## Reading bisulfite data (using files)
The R-method `readBismark()` is used to read bisulfite data files that are 
created by `Bismark`. Each file must include six columns, with no header, that 
represent 

 - Chromosome 
 - Start position in the chromosome  
 - End position in the chromosome
 - Methylation level (m/(m+u))
 - Number of methylated reads (m)
 - Number of un-methylated reads (u)

and each row is a cytosine (or a small region) in DNA. 

The function  `readBismark(<files' paths>, <files' names>)` has two inputs: 
'the paths of the files' and 'the names of the files'.
Using this function an object of class `BSDMC` is created. 
Extra information about data such as Age, Gender, Group, etc, must be assigned
to the object using `DataFrame` function. 
As an example, we have provided three files in the package that can be read as 
follows:

```{r, eval=TRUE}
library(DMCFB)
fn <- list.files(system.file("extdata",package = "DMCFB"))
fn.f <- list.files(system.file("extdata",package="DMCFB"), full.names=TRUE)
OBJ <- readBismark(fn.f, fn, mc.cores = 2)
cdOBJ <- DataFrame(Cell = factor(c("BC", "TC","Mono"),
levels = c("BC", "TC", "Mono")), row.names = c("BCU1568","BCU173","BCU551"))
colData(OBJ) <- cdOBJ
OBJ
```


## Reading bisulfite data (using matrices)
Alternatively, one can use two integer matrices and a `DataFrame` to create
`BSDMC` object using `cBSDMC()` function. One matrix includes the read-depth and
the other one includes methylation reads. The columns of these matrices 
represent samples and the rows represent cytosine positions. 

Additional information about the genomic positions and covariates must be stored
in a `DataFrame` and then assign to the object. 

The following exampel shows the details. 

```{r, eval=TRUE, message=FALSE}
library(DMCFB)
set.seed(1980)
nr <- 1000
nc <- 8
metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr)
methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc)
methl <- methc/metht
r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*')
names(r1) <- 1:nr
cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc])
OBJ2 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht,
  methLevels=methl,colData=cd1)
OBJ2
```

# Identifying DMCs
To identify DMCs, one need to use the function `findDMCFB()` function. 
The function

```{r, eval=FALSE}
library(DMCFB)
start.time <- Sys.time()
path0 <- "..//BCData/" # provide the path to the files
namelist.new <- list.files(path0,pattern="blk",full.names=F)
namelist.new.f <- list.files(path0,pattern="blk",full.names=T)
type <- NULL
for(i in seq_along(namelist.new)){
    type[i] <- unlist(strsplit(namelist.new[i], split=c('_'), fixed=TRUE))[2]
}
type
table(type)
indTC <- which(type=="TC")
indBC <- which(type=="BC")
indMono <- which(type=="Mono")
namelist.new <- namelist.new[c(indBC,indMono,indTC)]
namelist.new.f <- namelist.new.f[c(indBC,indMono,indTC)]
BLKDat <- readBismark(namelist.new.f, namelist.new, mc.cores = 2)
colData1 <- DataFrame(Group = factor(
  c(rep("BC",length(indBC)), rep("Mono",length(indMono)), 
  rep("TC", length(indTC))), levels = c("BC", "Mono", "TC")), 
  row.names = colnames(BLKData))
colData(BLKDat) <- colData1
BLK.BC.Mono.TC <- sort(BLKDat)
DMC.obj = findDMCFB(object = BLKDat, bwa = 30, bwb = 30, nBurn = 300, nMC = 300,
  nThin = 1, alpha = 5e-5, pSize = 500, sfiles = FALSE)
```


# Figures
To plot DMCs one can use the `plotDMCFB()` function to plot an `BSDMC` object 
that resulted from running `findDMCFB()` function. 
To illustrate use the following example:


```{r, fig.width=6}
library(DMCFB)
set.seed(1980)
nr <- 1000
nc <- 8
metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr)
methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc)
methl <- methc/metht
r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*')
names(r1) <- 1:nr
cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc])
OBJ1 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht,
  methLevels=methl,colData=cd1)
OBJ2 = findDMCFB(object = OBJ1, bwa = 30, bwb = 30, nBurn = 10, nMC = 10,
  nThin = 1, alpha = 0.05, pSize = 500, sfiles = FALSE)
plotDMCFB(OBJ2, region = c(1,400), nSplit = 2)
```

# Session info
```{r}
sessionInfo()
```