Installation

To use CytoMethIC, you need to install the package from Bioconductor. If you don’t have the BiocManager package installed, install it first:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
if (!requireNamespace("CytoMethIC", quietly = TRUE)) {
  BiocManager::install("CytoMethIC")
}

Introduction

CytoMethIC is a comprehensive package that provides model data and functions for easily using machine learning models that use data from the DNA methylome to classify cancer type and phenotype from a sample. The primary motivation for the development of this package is to abstract away the granular and accessibility-limiting code required to utilize machine learning models in R. Our package provides this abstraction for RandomForest, e1071 Support Vector, Extreme Gradient Boosting, and Tensorflow models. This is paired with an ExperimentHub component, which contains our lab’s models developed for epigenetic cancer classification and predicting phenotypes. This includes CNS tumor classification, Pan-cancer classification, race prediction, cell of origin classification, and subtype classification models.

library(CytoMethIC)
library(ExperimentHub)
library(sesame)
sesameDataCache()

Data from ExperimentHub

For these examples, we’ll be using models from ExperimentHub and a sample from sesameData.

CytoMethIC supported models
ModelID PredictionLabelDescription
rfc_cancertype_TCGA33 TCGA cancer types (N=33)
svm_cancertype_TCGA33 TCGA cancer types (N=33)
xgb_cancertype_TCGA33 TCGA cancer types (N=33)
mlp_cancertype_TCGA33 TCGA cancer types (N=33)
rfc_cancertype_CNS66 CNS Tumor Class (N=66)
svm_cancertype_CNS66 CNS Tumor Class (N=66)
xgb_cancertype_CNS66 CNS Tumor Class (N=66)
mlp_cancertype_CNS66 CNS Tumor Class (N=66)
NA NA
NA NA
NA NA

Pan-Cancer type classification

The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.

## for missing data
betas = imputeBetas(sesameDataGet("HM450.1.TCGA.PAAD")$betas)
cmi_predict(betas, ExperimentHub()[["EH8395"]])
## $response
## [1] "PAAD"
## 
## $prob
##  PAAD 
## 0.852
cmi_predict(betas, ExperimentHub()[["EH8396"]])
## $response
## [1] "PAAD"
## 
## $prob
## betas[, attr(model$terms, "term.labels")] 
##                                 0.9864795

Pan-Cancer subtype classification

The below snippet shows a demonstration of the cmi_predict function working to predict the subtype of the cancer.

cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8422"]])
## $response
## [1] "GI.CIN"
## 
## $prob
## GI.CIN 
##  0.462

Ethnicity classification

The below snippet shows a demonstration of the cmi_predict function working to predict the ethnicity of the patient.

cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8421"]])
## $response
## [1] "WHITE"
## 
## $prob
## WHITE 
## 0.886

Pan-Cancer COO classification

The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer.

cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, ExperimentHub()[["EH8423"]])
## $response
## [1] "C20:Mixed (Stromal/Immune)"
## 
## $prob
## C20:Mixed (Stromal/Immune) 
##                      0.768