Contents

miaViz implements plotting function to work with TreeSummarizedExperiment and related objects in a context of microbiome analysis. For more general plotting function on SummarizedExperiment objects the scater package offers several options, such as plotColData, plotExpression and plotRowData.

1 Installation

To install miaViz, install BiocManager first, if it is not installed. Afterwards use the install function from BiocManager and load miaViz.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("miaViz")
library(miaViz)
data(GlobalPatterns, package = "mia")

2 Abundance plotting

in contrast to other fields of sequencing based fields of research for which expression of genes is usually studied, microbiome research uses the more term Abundance to described the numeric data measured and analyzed. Technically, especially in context of SummarizedExperiment objects, there is no difference. Therefore plotExpression can be used to plot Abundance data.

plotAbundance can be used as well and as long as rank is set NULL, it behaves as plotExpression.

plotAbundance(GlobalPatterns, rank = NULL, 
              features = "549322", abund_values = "counts")

However, if the rank is set not NULL a bar plot is returned. At the same time the features argument can be set to NULL (default).

GlobalPatterns <- transformCounts(GlobalPatterns, method = "relabundance")
plotAbundance(GlobalPatterns, rank = "Kingdom", abund_values = "relabundance")

With subsetting to selected features the plot can be fine tuned.

prev_phylum <- getPrevalentTaxa(GlobalPatterns, rank = "Phylum",
                                detection = 0.01)
plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
              rank = "Phylum",
              abund_values = "relabundance")

The features argument is reused for plotting data along the different samples. In the next example the SampleType is plotted along the samples. In this case the result is a list, which can combined using external tools, for example patchwork.

library(patchwork)
plots <- plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
                       features = "SampleType",
                       rank = "Phylum",
                       abund_values = "relabundance")
plots$abundance / plots$SampleType +
     plot_layout(heights = c(9, 1))

Further example about composition barplot can be found at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).

3 Prevalence plotting

To visualize prevalence within the dataset, two functions are available, plotTaxaPrevalence, plotPrevalenceAbundance and plotPrevalence.

plotTaxaPrevalence produces a so-called landscape plot, which visualizes the prevalence of samples across abundance thresholds.

plotTaxaPrevalence(GlobalPatterns, rank = "Phylum",
                   detections = c(0, 0.001, 0.01, 0.1, 0.2))

plotPrevalenceAbundance plot the prevalence depending on the mean relative abundance on the chosen taxonomic level.

plotPrevalentAbundance(GlobalPatterns, rank = "Family",
                       colour_by = "Phylum") +
    scale_x_log10()

plotPrevalence plot the number of samples and their prevalence across different abundance thresholds. Abundance steps can be adjusted using the detections argument, whereas the analyzed prevalence steps is set using the prevalences argument.

plotPrevalence(GlobalPatterns,
               rank = "Phylum",
               detections = c(0.01, 0.1, 1, 2, 5, 10, 20)/100,
               prevalences = seq(0.1, 1, 0.1))

4 Tree plotting

The information stored in the rowTree can be directly plotted. However, sizes of stored trees have to be kept in mind and plotting of large trees rarely makes sense.

For this example we limit the information plotted to the top 100 taxa as judged by mean abundance on the genus level.

library(scater)
library(mia)
altExp(GlobalPatterns,"Genus") <- agglomerateByRank(GlobalPatterns,"Genus")
altExp(GlobalPatterns,"Genus") <- addPerFeatureQC(altExp(GlobalPatterns,"Genus"))
rowData(altExp(GlobalPatterns,"Genus"))$log_mean <-
    log(rowData(altExp(GlobalPatterns,"Genus"))$mean)
rowData(altExp(GlobalPatterns,"Genus"))$detected <- 
    rowData(altExp(GlobalPatterns,"Genus"))$detected / 100
top_taxa <- getTopTaxa(altExp(GlobalPatterns,"Genus"),
                       method="mean",
                       top=100L,
                       abund_values="counts")

Colour, size and shape of tree tips and nodes can be decorated based on data present in the SE object or by providing additional information via the other_fields argument. Note that currently information for nodes have to be provided via the other_fields arguments.

Data will be matched via the node or label argument depending on which was provided. label takes precedent.

plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
            tip_colour_by = "log_mean",
            tip_size_by = "detected")
#> Warning in .numeric_ij(ij = i, x = x, dim = "row"): For rows/cols with the same
#> name, only one is output
#> Warning in convertNode(tree = track, node = oldAlias): Multiple nodes are found
#> to have the same label.
#> Warning in convertNode(tree = value, node = nlk$nodeLab[iRep]): Multiple nodes
#> are found to have the same label.
Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size)

Figure 1: Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size)

Tip and node labels can be shown as well. Setting show_label = TRUE shows the tip labels only …

plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
            tip_colour_by = "log_mean",
            tip_size_by = "detected",
            show_label = TRUE)
#> Warning in .numeric_ij(ij = i, x = x, dim = "row"): For rows/cols with the same
#> name, only one is output
#> Warning in convertNode(tree = track, node = oldAlias): Multiple nodes are found
#> to have the same label.
#> Warning in convertNode(tree = value, node = nlk$nodeLab[iRep]): Multiple nodes
#> are found to have the same label.
Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size). Tip labels of the tree are shown as well.

Figure 2: Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size)
Tip labels of the tree are shown as well.

… whereas node labels can be selectively shown by providing a named logical vector to show_label.

Please not that currently ggtree only can plot node labels in a rectangular layout.

labels <- c("Genus:Providencia", "Genus:Morganella", "0.961.60")
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
            tip_colour_by = "log_mean",
            tip_size_by = "detected",
            show_label = labels,
            layout="rectangular")
#> Warning in .numeric_ij(ij = i, x = x, dim = "row"): For rows/cols with the same
#> name, only one is output
#> Warning in convertNode(tree = track, node = oldAlias): Multiple nodes are found
#> to have the same label.
#> Warning in convertNode(tree = value, node = nlk$nodeLab[iRep]): Multiple nodes
#> are found to have the same label.
Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size). Selected node and tip labels are shown.

Figure 3: Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size)
Selected node and tip labels are shown.

Information can also be visualized on the edges of the tree plot.

plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
            edge_colour_by = "Phylum",
            tip_colour_by = "log_mean")
#> Warning in .numeric_ij(ij = i, x = x, dim = "row"): For rows/cols with the same
#> name, only one is output
#> Warning in convertNode(tree = track, node = oldAlias): Multiple nodes are found
#> to have the same label.
#> Warning in convertNode(tree = value, node = nlk$nodeLab[iRep]): Multiple nodes
#> are found to have the same label.
Tree plot using ggtree with tip labels decorated by mean abundance (colour) and edges labeled Kingdom (colour) and prevalence (size)

Figure 4: Tree plot using ggtree with tip labels decorated by mean abundance (colour) and edges labeled Kingdom (colour) and prevalence (size)

5 Graph plotting

Similar to tree data, graph data can also be plotted in conjunction with SummarizedExperiment objects. Since the graph data in itself cannot be stored in a specialized slot, a graph object can be provided separately or as an element from the metedata.

Here we load an example graph. As graph data, all objects types accepted by as_tbl_graph from the tidygraph package are supported.

data(col_graph)

In the following examples, the weight data is automatically generated from the graph data. The SummarizedExperiment provided is required to have overlapping rownames with the node names of the graph. Using this link the graph plot can incorporated data from the SummarizedExperiment.

plotColGraph(col_graph,
             altExp(GlobalPatterns,"Genus"),
             colour_by = "SampleType",
             edge_colour_by = "weight",
             edge_width_by = "weight",
             show_label = TRUE)

As mentioned the graph data can be provided from the metadata of the SummarizedExperiment.

metadata(altExp(GlobalPatterns,"Genus"))$graph <- col_graph

This produces the same plot as shown above.

6 Plotting of serial data

library(microbiomeDataSets)
silverman <- SilvermanAGutData()
#> snapshotDate(): 2022-04-19
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
silverman <- transformCounts(silverman, method = "relabundance")
taxa <- getTopTaxa(silverman, 2)

Data from samples collected along time can be visualized using plotSeries. The x argument is used to reference data from the colData to use as descriptor for ordering the data. The y argument selects the feature to show. Since plotting a lot of features is not advised a maximum of 20 features can plotted at the same time.

plotSeries(silverman,
           x = "DAY_ORDER",
           y = taxa,
           colour_by = "Family")

If replicated data is present, data is automatically used for calculation of the mean and sd and plotted as a range. Data from different assays can be used for plotting via the abund_values.

plotSeries(silverman[taxa,],
           x = "DAY_ORDER",
           colour_by = "Family",
           linetype_by = "Phylum",
           abund_values = "relabundance")

Additional variables can be used to modify line type aesthetics.

plotSeries(silverman,
           x = "DAY_ORDER",
           y = getTopTaxa(silverman, 5),
           colour_by = "Family",
           linetype_by = "Phylum",
           abund_values = "counts")

7 Plotting factor data

To visualize the relative relations between two groupings among the factor data, two functions are available for the purpose; plotColTile and plotRowTile.

data(GlobalPatterns)
se <- GlobalPatterns
plotColTile(se,"SampleType","Primer") +
  theme(axis.text.x.top = element_text(angle = 45, hjust = 0))

8 DMN fit plotting

Searching for groups that are similar to each other among the samples, could be approached with the Dirichlet Multinomial Mixtures (Holmes, Harris, and Quince 2012). After using runDMN from the mia package, several k values as a number of clusters are used to observe the best fit (see also getDMN and getBestDMNFit). To visualize the fit using e.g. “laplace” as a measure of goodness of fit:

data(dmn_se, package = "mia")
names(metadata(dmn_se))
#> [1] "DMN"
# plot the fit
plotDMNFit(dmn_se, type = "laplace")

More examples and materials are available at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).

9 Session info

sessionInfo()
#> R version 4.2.0 RC (2022-04-19 r82224)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] microbiomeDataSets_1.3.0       scater_1.24.0                 
#>  [3] scuttle_1.6.0                  patchwork_1.1.1               
#>  [5] miaViz_1.4.0                   ggraph_2.0.5                  
#>  [7] ggplot2_3.3.5                  mia_1.4.0                     
#>  [9] MultiAssayExperiment_1.22.0    TreeSummarizedExperiment_2.4.0
#> [11] Biostrings_2.64.0              XVector_0.36.0                
#> [13] SingleCellExperiment_1.18.0    SummarizedExperiment_1.26.0   
#> [15] Biobase_2.56.0                 GenomicRanges_1.48.0          
#> [17] GenomeInfoDb_1.32.0            IRanges_2.30.0                
#> [19] S4Vectors_0.34.0               BiocGenerics_0.42.0           
#> [21] MatrixGenerics_1.8.0           matrixStats_0.62.0            
#> [23] BiocStyle_2.24.0              
#> 
#> loaded via a namespace (and not attached):
#>   [1] AnnotationHub_3.4.0           BiocFileCache_2.4.0          
#>   [3] plyr_1.8.7                    igraph_1.3.1                 
#>   [5] lazyeval_0.2.2                splines_4.2.0                
#>   [7] BiocParallel_1.30.0           digest_0.6.29                
#>   [9] yulab.utils_0.0.4             htmltools_0.5.2              
#>  [11] viridis_0.6.2                 magick_2.7.3                 
#>  [13] fansi_1.0.3                   magrittr_2.0.3               
#>  [15] memoise_2.0.1                 ScaledMatrix_1.4.0           
#>  [17] cluster_2.1.3                 DECIPHER_2.24.0              
#>  [19] graphlayouts_0.8.0            colorspace_2.0-3             
#>  [21] rappdirs_0.3.3                blob_1.2.3                   
#>  [23] ggrepel_0.9.1                 xfun_0.30                    
#>  [25] dplyr_1.0.8                   crayon_1.5.1                 
#>  [27] RCurl_1.98-1.6                jsonlite_1.8.0               
#>  [29] ape_5.6-2                     glue_1.6.2                   
#>  [31] polyclip_1.10-0               gtable_0.3.0                 
#>  [33] zlibbioc_1.42.0               DelayedArray_0.22.0          
#>  [35] BiocSingular_1.12.0           scales_1.2.0                 
#>  [37] DBI_1.1.2                     Rcpp_1.0.8.3                 
#>  [39] xtable_1.8-4                  viridisLite_0.4.0            
#>  [41] decontam_1.16.0               gridGraphics_0.5-1           
#>  [43] tidytree_0.3.9                bit_4.0.4                    
#>  [45] rsvd_1.0.5                    httr_1.4.2                   
#>  [47] RColorBrewer_1.1-3            ellipsis_0.3.2               
#>  [49] pkgconfig_2.0.3               farver_2.1.0                 
#>  [51] dbplyr_2.1.1                  sass_0.4.1                   
#>  [53] utf8_1.2.2                    AnnotationDbi_1.58.0         
#>  [55] later_1.3.0                   ggplotify_0.1.0              
#>  [57] tidyselect_1.1.2              labeling_0.4.2               
#>  [59] rlang_1.0.2                   reshape2_1.4.4               
#>  [61] munsell_0.5.0                 BiocVersion_3.15.2           
#>  [63] tools_4.2.0                   cachem_1.0.6                 
#>  [65] cli_3.3.0                     DirichletMultinomial_1.38.0  
#>  [67] generics_0.1.2                RSQLite_2.2.12               
#>  [69] ExperimentHub_2.4.0           evaluate_0.15                
#>  [71] stringr_1.4.0                 fastmap_1.1.0                
#>  [73] yaml_2.3.5                    ggtree_3.4.0                 
#>  [75] knitr_1.38                    bit64_4.0.5                  
#>  [77] tidygraph_1.2.1               purrr_0.3.4                  
#>  [79] KEGGREST_1.36.0               nlme_3.1-157                 
#>  [81] sparseMatrixStats_1.8.0       mime_0.12                    
#>  [83] aplot_0.1.3                   compiler_4.2.0               
#>  [85] png_0.1-7                     interactiveDisplayBase_1.34.0
#>  [87] filelock_1.0.2                curl_4.3.2                   
#>  [89] beeswarm_0.4.0                treeio_1.20.0                
#>  [91] tibble_3.1.6                  tweenr_1.0.2                 
#>  [93] bslib_0.3.1                   stringi_1.7.6                
#>  [95] highr_0.9                     lattice_0.20-45              
#>  [97] Matrix_1.4-1                  vegan_2.6-2                  
#>  [99] permute_0.9-7                 vctrs_0.4.1                  
#> [101] pillar_1.7.0                  lifecycle_1.0.1              
#> [103] BiocManager_1.30.17           jquerylib_0.1.4              
#> [105] BiocNeighbors_1.14.0          cowplot_1.1.1                
#> [107] bitops_1.0-7                  irlba_2.3.5                  
#> [109] httpuv_1.6.5                  R6_2.5.1                     
#> [111] promises_1.2.0.1              bookdown_0.26                
#> [113] gridExtra_2.3                 vipor_0.4.5                  
#> [115] MASS_7.3-57                   assertthat_0.2.1             
#> [117] withr_2.5.0                   GenomeInfoDbData_1.2.8       
#> [119] mgcv_1.8-40                   parallel_4.2.0               
#> [121] grid_4.2.0                    ggfun_0.0.6                  
#> [123] beachmat_2.12.0               tidyr_1.2.0                  
#> [125] rmarkdown_2.14                DelayedMatrixStats_1.18.0    
#> [127] ggnewscale_0.4.7              ggforce_0.3.3                
#> [129] shiny_1.7.1                   ggbeeswarm_0.6.0

References

Holmes, Ian, Keith Harris, and Christopher Quince. 2012. “Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics.” PLOS ONE. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030126.

Lahti, Leo, Sudarshan Shetty, and Felix GM Ernst. 2021. “Orchestrating Microbiome Analysis.” https://microbiome.github.io/OMA/.