A workflow for identifying enterotypes based on the relative abundance of gut microbiota was implemented refereed on the reports of Arumugam[^2]
library(mbOmic)
library(data.table)
First of all, the dataset of microbiota relative abundance was retrived from the enterotypes weblink. The missing value was imputed using KNN by impute
package.
read.delim('http://enterotypes.org/ref_samples_abundance_MetaHIT.txt')
dat <- impute::impute.knn(as.matrix(dat), k = 100)
dat <- as.data.frame(dat$data+0.001)
dat <-setDT(dat, keep.rownames = TRUE)
dat
Constructe the bSet
class and then estimate the the proper cluster number using the estimate_k
function. The estimate_k
function take advantage of Jensen-Shannon divergence
to cluster the samples and the number of clusters was optimizated by Calinski-Harabasz (CH) Index and Silhouette Coefficient.
The estimate_k
returns verCHI
class, a S3
class containing a optimal cluster results, optimal number cluster, a minmum CHI, a minmum Silhouette value, and Jensen-Shannon divergence matrix.
bSet(b = dat)
dat <- estimate_k(dat)
res <-
res#> optimal number of cluster: 4
#> Max CHI: 164.642158008611
#> Silhouette: 0.181445495999067
The proper number of cluster is 4.
Next, the enterotyping
function was used to identify the enterotype for each cluster and it returns a 3-length list. This list contains two enterotypes matrices and a unidentified samples vector. Cluster 2, 3, and 4 was enterotype Bacteroides, Prevotella, and Ruminococcus, resepectively.
enterotyping(dat, res$verOptCluster)
ret=
ret#> $enterotypes
#> Enterotype max which cluster
#> 1: Bacteroides 0.36724946 2 cluster 2
#> 2: Prevotella 0.29692944 3 cluster 3
#> 3: Ruminococcus 0.02416713 4 cluster 4
#>
#> $data
#> Samples Enterotype cluster
#> 1: MH0087 Bacteroides cluster 2
#> 2: MH0156 Bacteroides cluster 2
#> 3: MH0444 Bacteroides cluster 2
#> 4: MH0333 Bacteroides cluster 2
#> 5: MH0233 Bacteroides cluster 2
#> ---
#> 234: MH0012 Ruminococcus cluster 4
#> 235: MH0415 Ruminococcus cluster 4
#> 236: MH0457 Ruminococcus cluster 4
#> 237: MH0442 Ruminococcus cluster 4
#> 238: MH0448 Ruminococcus cluster 4
#>
#> $UnIdentifiedSamples
#> [1] "MH0277" "MH0161" "MH0046" "MH0175" "MH0152" "MH0104" "MH0151" "MH0189"
#> [9] "MH0030" "MH0157" "MH0063" "MH0075" "MH0141" "MH0169" "MH0050" "MH0286"
#> [17] "MH0096" "MH0053" "MH0217" "MH0098" "MH0009" "MH0197" "MH0065" "MH0173"
#> [25] "MH0168" "MH0070" "MH0077" "MH0288" "MH0200" "MH0031" "MH0183" "MH0132"
#> [33] "MH0144" "MH0124" "MH0430" "MH0276" "MH0407" "MH0428" "MH0126" "MH0447"
Furthermore, this result was validated by enterotypes results given by the enterotype website.
read.table(system.file('extdata', 'enterotype.txt', package = 'mbOmic'))
enterotypes <- enterotypes[samples(dat),]
enterotypes <-table(res$verOptCluster, enterotypes$ET)
#>
#> ET_B ET_F ET_P
#> 1 0 21 19
#> 2 67 5 0
#> 3 0 0 40
#> 4 3 123 0
::session_info()
devtools#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R Under development (unstable) (2022-10-25 r83175)
#> os Ubuntu 22.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2022-11-01
#> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> ade4 1.7-19 2022-04-19 [2] CRAN (R 4.3.0)
#> AnnotationDbi 1.61.0 2022-11-01 [2] Bioconductor
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.3.0)
#> backports 1.4.1 2021-12-13 [2] CRAN (R 4.3.0)
#> base64enc 0.1-3 2015-07-28 [2] CRAN (R 4.3.0)
#> Biobase 2.59.0 2022-11-01 [2] Bioconductor
#> BiocGenerics 0.45.0 2022-11-01 [2] Bioconductor
#> Biostrings 2.67.0 2022-11-01 [2] Bioconductor
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.3.0)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.3.0)
#> bitops 1.0-7 2021-04-24 [2] CRAN (R 4.3.0)
#> blob 1.2.3 2022-04-10 [2] CRAN (R 4.3.0)
#> bslib 0.4.0 2022-07-16 [2] CRAN (R 4.3.0)
#> cachem 1.0.6 2021-08-19 [2] CRAN (R 4.3.0)
#> callr 3.7.2 2022-08-22 [2] CRAN (R 4.3.0)
#> checkmate 2.1.0 2022-04-21 [2] CRAN (R 4.3.0)
#> class 7.3-20.1 2022-05-30 [2] CRAN (R 4.3.0)
#> cli 3.4.1 2022-09-23 [2] CRAN (R 4.3.0)
#> cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.0)
#> clusterSim 0.50-1 2022-05-24 [2] CRAN (R 4.3.0)
#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.3.0)
#> colorspace 2.0-3 2022-02-21 [2] CRAN (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [2] CRAN (R 4.3.0)
#> data.table * 1.14.4 2022-10-17 [2] CRAN (R 4.3.0)
#> DBI 1.1.3 2022-06-18 [2] CRAN (R 4.3.0)
#> deldir 1.0-6 2021-10-23 [2] CRAN (R 4.3.0)
#> devtools 2.4.5 2022-10-11 [2] CRAN (R 4.3.0)
#> digest 0.6.30 2022-10-18 [2] CRAN (R 4.3.0)
#> doParallel 1.0.17 2022-02-07 [2] CRAN (R 4.3.0)
#> dplyr 1.0.10 2022-09-01 [2] CRAN (R 4.3.0)
#> dynamicTreeCut 1.63-1 2016-03-11 [2] CRAN (R 4.3.0)
#> e1071 1.7-12 2022-10-24 [2] CRAN (R 4.3.0)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.3.0)
#> evaluate 0.17 2022-10-07 [2] CRAN (R 4.3.0)
#> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.3.0)
#> fastcluster 1.2.3 2021-05-24 [2] CRAN (R 4.3.0)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.3.0)
#> foreach 1.5.2 2022-02-02 [2] CRAN (R 4.3.0)
#> foreign 0.8-83 2022-09-28 [2] CRAN (R 4.3.0)
#> Formula 1.2-4 2020-10-16 [2] CRAN (R 4.3.0)
#> fs 1.5.2 2021-12-08 [2] CRAN (R 4.3.0)
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.3.0)
#> GenomeInfoDb 1.35.0 2022-11-01 [2] Bioconductor
#> GenomeInfoDbData 1.2.9 2022-10-29 [2] Bioconductor
#> ggplot2 3.3.6 2022-05-03 [2] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [2] CRAN (R 4.3.0)
#> GO.db 3.16.0 2022-10-31 [2] Bioconductor
#> gridExtra 2.3 2017-09-09 [2] CRAN (R 4.3.0)
#> gtable 0.3.1 2022-09-01 [2] CRAN (R 4.3.0)
#> highr 0.9 2021-04-16 [2] CRAN (R 4.3.0)
#> Hmisc 4.7-1 2022-08-15 [2] CRAN (R 4.3.0)
#> htmlTable 2.4.1 2022-07-07 [2] CRAN (R 4.3.0)
#> htmltools 0.5.3 2022-07-18 [2] CRAN (R 4.3.0)
#> htmlwidgets 1.5.4 2021-09-08 [2] CRAN (R 4.3.0)
#> httpuv 1.6.6 2022-09-08 [2] CRAN (R 4.3.0)
#> httr 1.4.4 2022-08-17 [2] CRAN (R 4.3.0)
#> igraph 1.3.5 2022-09-22 [2] CRAN (R 4.3.0)
#> impute 1.73.0 2022-11-01 [2] Bioconductor
#> interp 1.1-3 2022-07-13 [2] CRAN (R 4.3.0)
#> IRanges 2.33.0 2022-11-01 [2] Bioconductor
#> iterators 1.0.14 2022-02-05 [2] CRAN (R 4.3.0)
#> jpeg 0.1-9 2021-07-24 [2] CRAN (R 4.3.0)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.3.0)
#> jsonlite 1.8.3 2022-10-21 [2] CRAN (R 4.3.0)
#> KEGGREST 1.39.0 2022-11-01 [2] Bioconductor
#> knitr 1.40 2022-08-24 [2] CRAN (R 4.3.0)
#> later 1.3.0 2021-08-18 [2] CRAN (R 4.3.0)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.3.0)
#> latticeExtra 0.6-30 2022-07-04 [2] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.3.0)
#> MASS 7.3-58.1 2022-08-03 [2] CRAN (R 4.3.0)
#> Matrix 1.5-1 2022-09-13 [2] CRAN (R 4.3.0)
#> matrixStats 0.62.0 2022-04-19 [2] CRAN (R 4.3.0)
#> mbOmic * 1.3.0 2022-11-01 [1] Bioconductor
#> memoise 2.0.1 2021-11-26 [2] CRAN (R 4.3.0)
#> mime 0.12 2021-09-28 [2] CRAN (R 4.3.0)
#> miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.3.0)
#> mnormt 2.1.1 2022-09-26 [2] CRAN (R 4.3.0)
#> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.3.0)
#> nlme 3.1-160 2022-10-26 [2] local
#> nnet 7.3-18 2022-09-28 [2] CRAN (R 4.3.0)
#> pillar 1.8.1 2022-08-19 [2] CRAN (R 4.3.0)
#> pkgbuild 1.3.1 2021-12-20 [2] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.3.0)
#> pkgload 1.3.1 2022-10-28 [2] CRAN (R 4.3.0)
#> png 0.1-7 2013-12-03 [2] CRAN (R 4.3.0)
#> preprocessCore 1.61.0 2022-11-01 [2] Bioconductor
#> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.3.0)
#> processx 3.8.0 2022-10-26 [2] CRAN (R 4.3.0)
#> profvis 0.3.7 2020-11-02 [2] CRAN (R 4.3.0)
#> promises 1.2.0.1 2021-02-11 [2] CRAN (R 4.3.0)
#> proxy 0.4-27 2022-06-09 [2] CRAN (R 4.3.0)
#> ps 1.7.2 2022-10-26 [2] CRAN (R 4.3.0)
#> psych 2.2.9 2022-09-29 [2] CRAN (R 4.3.0)
#> purrr 0.3.5 2022-10-06 [2] CRAN (R 4.3.0)
#> R2HTML 2.3.3 2022-05-23 [2] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.3.0)
#> RColorBrewer 1.1-3 2022-04-03 [2] CRAN (R 4.3.0)
#> Rcpp 1.0.9 2022-07-08 [2] CRAN (R 4.3.0)
#> RCurl 1.98-1.9 2022-10-03 [2] CRAN (R 4.3.0)
#> remotes 2.4.2 2021-11-30 [2] CRAN (R 4.3.0)
#> rlang 1.0.6 2022-09-24 [2] CRAN (R 4.3.0)
#> rmarkdown 2.17 2022-10-07 [2] CRAN (R 4.3.0)
#> rpart 4.1.19 2022-10-21 [2] CRAN (R 4.3.0)
#> RSQLite 2.2.18 2022-10-04 [2] CRAN (R 4.3.0)
#> rstudioapi 0.14 2022-08-22 [2] CRAN (R 4.3.0)
#> S4Vectors 0.37.0 2022-11-01 [2] Bioconductor
#> sass 0.4.2 2022-07-16 [2] CRAN (R 4.3.0)
#> scales 1.2.1 2022-08-20 [2] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0)
#> shiny 1.7.3 2022-10-25 [2] CRAN (R 4.3.0)
#> stringi 1.7.8 2022-07-11 [2] CRAN (R 4.3.0)
#> stringr 1.4.1 2022-08-20 [2] CRAN (R 4.3.0)
#> survival 3.4-0 2022-08-09 [2] CRAN (R 4.3.0)
#> tibble 3.1.8 2022-07-22 [2] CRAN (R 4.3.0)
#> tidyselect 1.2.0 2022-10-10 [2] CRAN (R 4.3.0)
#> urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.3.0)
#> usethis 2.1.6 2022-05-25 [2] CRAN (R 4.3.0)
#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.3.0)
#> vctrs 0.5.0 2022-10-22 [2] CRAN (R 4.3.0)
#> visNetwork 2.1.2 2022-09-29 [2] CRAN (R 4.3.0)
#> WGCNA 1.71 2022-04-22 [2] CRAN (R 4.3.0)
#> xfun 0.34 2022-10-18 [2] CRAN (R 4.3.0)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.3.0)
#> XVector 0.39.0 2022-11-01 [2] Bioconductor
#> yaml 2.3.6 2022-10-18 [2] CRAN (R 4.3.0)
#> zlibbioc 1.45.0 2022-11-01 [2] Bioconductor
#>
#> [1] /tmp/RtmpAn6puC/Rinst2baa1a1735f6af
#> [2] /home/biocbuild/bbs-3.17-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────