Introduction
Clustering is very importance method to classify items into different categories and to infer functions since similar objects tend to behavior similarly. There are more than 200 packages in Bioconductor implement clustering algorithms or employ clustering methods for omic-data analysis.
Albeit the methods are important for data analysis, the visualization is quite limited. Most the the packages only have the ability to visualize the hierarchical tree structure using stats:::plot.hclust()
. This package is design to visualize hierarchical tree structure with associated data (e.g., clinical information collected with the samples) using the powerful in-house developed ggtree package.
This package implements a set of autoplot()
methods to display tree structure. We will implement more autoplot()
methods to support more objects. The output of these autoplot()
methods is a ggtree
object, which can be further annotated by adding layers using ggplot2 syntax. Integrating associated data to annotate the tree is also supported by ggtreeExtra package.
Here are some demonstrations of using autoplot()
methods to visualize common hierarchical clustering tree objects.
hclust
and dendrogram
objects
These two classes are defined in the stats package.
linkage
object
The class linkage
is defined in the mdendro package.
agnes
, diana
and twins
objects
These classes are defined in the cluster package.
pvclust
object
The pvclust
class is defined in the pvclust package.
library(pvclust)
data(Boston, package = "MASS")
set.seed(123)
result <- pvclust(Boston, method.dist="cor", method.hclust="average", nboot=1000, parallel=TRUE)
## Creating a temporary cluster...done:
## socket cluster with 71 nodes on host 'localhost'
## Multiscale bootstrap... Done.
The pvclust
object contains two types of p-values: AU (Approximately Unbiased) p-value and BP (Boostrap Probability) value. These values will be automatically labelled on the tree.
Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] pvclust_2.2-0 cluster_2.1.4 mdendro_2.1.0 aplot_0.2.0
## [5] ggtreeDendro_1.2.1 ggtree_3.8.2 yulab.utils_0.0.7
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.7 utf8_1.2.3 generics_0.1.3 tidyr_1.3.0
## [5] prettydoc_0.4.1 ggplotify_0.1.2 lattice_0.21-8 digest_0.6.33
## [9] magrittr_2.0.3 evaluate_0.21 grid_4.3.1 fastmap_1.1.1
## [13] jsonlite_1.8.7 ape_5.7-1 purrr_1.0.2 fansi_1.0.4
## [17] scales_1.2.1 lazyeval_0.2.2 jquerylib_0.1.4 cli_3.6.1
## [21] rlang_1.1.1 munsell_0.5.0 tidytree_0.4.5 withr_2.5.0
## [25] cachem_1.0.8 yaml_2.3.7 tools_4.3.1 parallel_4.3.1
## [29] memoise_2.0.1 dplyr_1.1.2 colorspace_2.1-0 ggplot2_3.4.3
## [33] vctrs_0.6.3 R6_2.5.1 gridGraphics_0.5-1 lifecycle_1.0.3
## [37] ggfun_0.1.2 treeio_1.24.3 pkgconfig_2.0.3 pillar_1.9.0
## [41] bslib_0.5.1 gtable_0.3.3 glue_1.6.2 Rcpp_1.0.11
## [45] xfun_0.40 tibble_3.2.1 tidyselect_1.2.0 highr_0.10
## [49] knitr_1.43 farver_2.1.1 htmltools_0.5.6 nlme_3.1-163
## [53] patchwork_1.1.3 rmarkdown_2.24 labeling_0.4.2 compiler_4.3.1