--- title: "miaViz" output: BiocStyle::html_document: fig_height: 7 fig_width: 10 toc: yes toc_depth: 2 number_sections: true vignette: > %\VignetteIndexEntry{miaViz} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::opts_chunk$set(dev = "png", dev.args = list(type = "cairo-png")) ``` `miaViz` implements plotting function to work with `TreeSummarizedExperiment` and related objects in a context of microbiome analysis. For more general plotting function on `SummarizedExperiment` objects the `scater` package offers several options, such as `plotColData`, `plotExpression` and `plotRowData`. # Installation To install `miaViz`, install `BiocManager` first, if it is not installed. Afterwards use the `install` function from `BiocManager` and load `miaViz`. ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("miaViz") ``` ```{r setup, message=FALSE} library(miaViz) data(GlobalPatterns, package = "mia") ``` # Abundance plotting in contrast to other fields of sequencing based fields of research for which expression of genes is usually studied, microbiome research uses the more term ~Abundance~ to described the numeric data measured and analyzed. Technically, especially in context of `SummarizedExperiment` objects, there is no difference. Therefore `plotExpression` can be used to plot `Abundance` data. `plotAbundance` can be used as well and as long as `rank` is set `NULL`, it behaves as `plotExpression`. ```{r} plotAbundance(GlobalPatterns, rank = NULL, features = "549322", assay.type = "counts") ``` However, if the `rank` is set not `NULL` a bar plot is returned. At the same time the `features` argument can be set to `NULL` (default). ```{r} GlobalPatterns <- transformCounts(GlobalPatterns, method = "relabundance") ``` ```{r} plotAbundance(GlobalPatterns, rank = "Kingdom", assay.type = "relabundance") ``` With subsetting to selected features the plot can be fine tuned. ```{r} prev_phylum <- getPrevalentTaxa(GlobalPatterns, rank = "Phylum", detection = 0.01) ``` ```{r} plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum], rank = "Phylum", assay.type = "relabundance") ``` The `features` argument is reused for plotting data along the different samples. In the next example the ~SampleType~ is plotted along the samples. In this case the result is a list, which can combined using external tools, for example `patchwork`. ```{r} library(patchwork) plots <- plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum], features = "SampleType", rank = "Phylum", assay.type = "relabundance") plots$abundance / plots$SampleType + plot_layout(heights = c(9, 1)) ``` Further example about composition barplot can be found at Orchestrating Microbiome Analysis [@OMA]. # Prevalence plotting To visualize prevalence within the dataset, two functions are available, `plotTaxaPrevalence`, `plotPrevalenceAbundance` and `plotPrevalence`. `plotTaxaPrevalence` produces a so-called landscape plot, which visualizes the prevalence of samples across abundance thresholds. ```{r} plotTaxaPrevalence(GlobalPatterns, rank = "Phylum", detections = c(0, 0.001, 0.01, 0.1, 0.2)) ``` `plotPrevalenceAbundance` plot the prevalence depending on the mean relative abundance on the chosen taxonomic level. ```{r} plotPrevalentAbundance(GlobalPatterns, rank = "Family", colour_by = "Phylum") + scale_x_log10() ``` `plotPrevalence` plot the number of samples and their prevalence across different abundance thresholds. Abundance steps can be adjusted using the `detections` argument, whereas the analyzed prevalence steps is set using the `prevalences` argument. ```{r} plotPrevalence(GlobalPatterns, rank = "Phylum", detections = c(0.01, 0.1, 1, 2, 5, 10, 20)/100, prevalences = seq(0.1, 1, 0.1)) ``` # Tree plotting The information stored in the `rowTree` can be directly plotted. However, sizes of stored trees have to be kept in mind and plotting of large trees rarely makes sense. For this example we limit the information plotted to the top 100 taxa as judged by mean abundance on the genus level. ```{r, message=FALSE} library(scater) library(mia) ``` ```{r} altExp(GlobalPatterns,"Genus") <- agglomerateByRank(GlobalPatterns,"Genus") altExp(GlobalPatterns,"Genus") <- addPerFeatureQC(altExp(GlobalPatterns,"Genus")) rowData(altExp(GlobalPatterns,"Genus"))$log_mean <- log(rowData(altExp(GlobalPatterns,"Genus"))$mean) rowData(altExp(GlobalPatterns,"Genus"))$detected <- rowData(altExp(GlobalPatterns,"Genus"))$detected / 100 top_taxa <- getTopTaxa(altExp(GlobalPatterns,"Genus"), method="mean", top=100L, assay.type="counts") ``` Colour, size and shape of tree tips and nodes can be decorated based on data present in the `SE` object or by providing additional information via the `other_fields` argument. Note that currently information for nodes have to be provided via the `other_fields` arguments. Data will be matched via the `node` or `label` argument depending on which was provided. `label` takes precedent. ```{r plot1, fig.cap="Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size)"} plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,], tip_colour_by = "log_mean", tip_size_by = "detected") ``` Tip and node labels can be shown as well. Setting `show_label = TRUE` shows the tip labels only ... ```{r plot2, fig.cap="Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size). Tip labels of the tree are shown as well."} plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,], tip_colour_by = "log_mean", tip_size_by = "detected", show_label = TRUE) ``` ... whereas node labels can be selectively shown by providing a named logical vector to `show_label`. Please note that currently `ggtree` can only plot node labels in a rectangular layout. ```{r plot3, fig.cap="Tree plot using ggtree with tip labels decorated by mean abundance (colour) and prevalence (size). Selected node and tip labels are shown."} labels <- c("Genus:Providencia", "Genus:Morganella", "0.961.60") plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,], tip_colour_by = "log_mean", tip_size_by = "detected", show_label = labels, layout="rectangular") ``` Information can also be visualized on the edges of the tree plot. ```{r plot4, fig.cap="Tree plot using ggtree with tip labels decorated by mean abundance (colour) and edges labeled Kingdom (colour) and prevalence (size)"} plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,], edge_colour_by = "Phylum", tip_colour_by = "log_mean") ``` # Graph plotting Similar to tree data, graph data can also be plotted in conjunction with `SummarizedExperiment` objects. Since the graph data in itself cannot be stored in a specialized slot, a graph object can be provided separately or as an element from the `metedata`. Here we load an example graph. As graph data, all objects types accepted by `as_tbl_graph` from the `tidygraph` package are supported. ```{r} data(col_graph) ``` In the following examples, the `weight` data is automatically generated from the graph data. The `SummarizedExperiment` provided is required to have overlapping rownames with the node names of the graph. Using this link the graph plot can incorporated data from the `SummarizedExperiment`. ```{r} plotColGraph(col_graph, altExp(GlobalPatterns,"Genus"), colour_by = "SampleType", edge_colour_by = "weight", edge_width_by = "weight", show_label = TRUE) ``` As mentioned the graph data can be provided from the `metadata` of the `SummarizedExperiment`. ```{r} metadata(altExp(GlobalPatterns,"Genus"))$graph <- col_graph ``` This produces the same plot as shown above. ```{r include=FALSE} plotColGraph(altExp(GlobalPatterns,"Genus"), name = "graph", colour_by = "SampleType", edge_colour_by = "weight", edge_width_by = "weight", show_label = TRUE) ``` # Plotting of serial data ```{r eval=FALSE} # Load data from miaTime package library("miaTime") data(SilvermanAGutData, package="miaTime") tse <- SilvermanAGutData tse <- transformCounts(tse, method = "relabundance") taxa <- getTopTaxa(tse, 2) ``` Data from samples collected along time can be visualized using `plotSeries`. The `x` argument is used to reference data from the `colData` to use as descriptor for ordering the data. The `y` argument selects the feature to show. Since plotting a lot of features is not advised a maximum of 20 features can plotted at the same time. ```{r eval=FALSE} plotSeries(tse, x = "DAY_ORDER", y = taxa, colour_by = "Family") ``` If replicated data is present, data is automatically used for calculation of the `mean` and `sd` and plotted as a range. Data from different assays can be used for plotting via the `assay.type`. ```{r eval=FALSE} plotSeries(tse[taxa,], x = "DAY_ORDER", colour_by = "Family", linetype_by = "Phylum", assay.type = "relabundance") ``` Additional variables can be used to modify line type aesthetics. ```{r eval=FALSE} plotSeries(tse, x = "DAY_ORDER", y = getTopTaxa(tse, 5), colour_by = "Family", linetype_by = "Phylum", assay.type = "counts") ``` # Plotting factor data To visualize the relative relations between two groupings among the factor data, two functions are available for the purpose; `plotColTile` and `plotRowTile`. ```{r} data(GlobalPatterns, package="mia") se <- GlobalPatterns plotColTile(se,"SampleType","Primer") + theme(axis.text.x.top = element_text(angle = 45, hjust = 0)) ``` # DMN fit plotting Searching for groups that are similar to each other among the samples, could be approached with the Dirichlet Multinomial Mixtures [@DMM]. After using `runDMN` from the `mia` package, several k values as a number of clusters are used to observe the best fit (see also `getDMN` and `getBestDMNFit`). To visualize the fit using e.g. "laplace" as a measure of goodness of fit: ```{r, eval=FALSE} data(dmn_se, package = "mia") names(metadata(dmn_se)) # plot the fit plotDMNFit(dmn_se, type = "laplace") ``` # Serial data ordination and trajectories ```{r, error=FALSE, warning=FALSE, results='hide'} if(!requireNamespace("devtools", quietly = TRUE)){ BiocManager::install("devtools") } if(!requireNamespace("miaTime", quietly = TRUE)){ devtools::install_github("microbiome/miaTime") } ``` Principal Coordinates Analysis using Bray-Curtis dissimilarity on the `hitchip1006` dataset: ```{r MDS, eval=FALSE, include=FALSE} library(miaTime) data(hitchip1006, package = "miaTime") tse <- hitchip1006 tse <- transformCounts(tse, method = "relabundance") ## Ordination with PCoA with Bray-Curtis dissimilarity tse <- runMDS(tse, FUN = vegan::vegdist, method = "bray", name = "PCoA_BC", exprs_values = "relabundance", na.rm = TRUE) # plot p <- plotReducedDim(tse, dimred = "PCoA_BC") p ``` Retrieving information about all available trajectories: ```{r eval=FALSE} library(dplyr) # List subjects with two time points selected.subjects <- names(which(table(tse$subject)==2)) # Subjects counts per number of time points available in the data table(table(tse$subject)) %>% as.data.frame() %>% rename(Timepoints=Var1, Subjects=Freq) ``` Lets look at all trajectories having two time points in the data: ```{r eval=FALSE} # plot p + geom_path(aes(x=X1, y=X2, group=subject), arrow=arrow(length = unit(0.1, "inches")), # combining ordination data and metadata then selecting the subjects # Note, scuttle::makePerCellDF could also be used for the purpose. data = subset(data.frame(reducedDim(tse), colData(tse)), subject %in% selected.subjects) %>% arrange(time))+ labs(title = "All trajectories with two time points")+ theme(plot.title = element_text(hjust = 0.5)) ``` Filtering the two time point trajectories by divergence and displaying top 10%: ```{r, eval=FALSE} library(miaTime) # calculating step wise divergence based on the microbial profiles tse <- getStepwiseDivergence(tse, group = "subject", time_field = "time") # retrieving the top 10% divergent subjects having two time points top.selected.subjects <- subset(data.frame(reducedDim(tse), colData(tse)), subject %in% selected.subjects) %>% top_frac(0.1, time_divergence) %>% select(subject) %>% .[[1]] # plot p + geom_path(aes(x=X1, y=X2, color=time_divergence, group=subject), # the data is sorted in descending order in terms of time # since geom_path will use the first occurring observation # to color the corresponding segment. Without the sorting # geom_path will pick up NA values (corresponding to initial time # points); breaking the example. data = subset(data.frame(reducedDim(tse), colData(tse)), subject %in% top.selected.subjects) %>% arrange(desc(time)), # arrow end is reversed, due to the earlier sorting. arrow=arrow(length = unit(0.1, "inches"), ends = "first"))+ labs(title = "Top 10% divergent trajectories from time point one to two")+ scale_color_gradient2(low="white", high="red")+ theme(plot.title = element_text(hjust = 0.5)) ``` Plotting an example of the trajectory with the maximum total divergence: ```{r, eval=FALSE} # Get subject with the maximum total divergence selected.subject <- data.frame(reducedDim(tse), colData(tse)) %>% group_by(subject) %>% summarise(total_divergence = sum(time_divergence, na.rm = TRUE)) %>% filter(total_divergence==max(total_divergence)) %>% select(subject) %>% .[[1]] # plot p + geom_path(aes(x=X1, y=X2, group=subject), data = subset(data.frame(reducedDim(tse), colData(tse)), subject %in% selected.subject) %>% arrange(time), arrow=arrow(length = unit(0.1, "inches")))+ labs(title = "Longest trajectory by divergence")+ theme(plot.title = element_text(hjust = 0.5)) ``` More examples and materials are available at Orchestrating Microbiome Analysis [@OMA]. # Session info ```{r} sessionInfo() ``` # References