%\VignetteIndexEntry{Analysis examples using phyloseq} %\VignetteKeywords{distance, ordination, plot, test, analysis} %\VignettePackage{phyloseq} % % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[10pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage[width=.85\textwidth,font=small,labelfont=bf]{caption} \usepackage{subfig} \textwidth=6.5in \textheight=8.5in %\parskip=.3cm \oddsidemargin=-.1in \evensidemargin=-.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\R}{{\textsf{R}}} \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\term}[1]{{\emph{#1}}} \newcommand{\Rpackage}[1]{\textsf{#1}} \newcommand{\Rfunction}[1]{\texttt{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} %\bibliographystyle{plainnat} % works %\bibliographystyle{nature} %\bibliographystyle{biblatex-nature} %\bibliographystyle{science} %\bibliographystyle{biblatex-science} %\bibliographystyle{pnas2009} % works \bibliographystyle{naturemag} \begin{document} \title{ Vignette for phyloseq: A Bioconductor package for handling and analysis of high-throughput phylogenetic sequence data } \author{Paul J. M{c}Murdie$^*$ and Susan Holmes\\ Statistics Department, Stanford University,\\ Stanford, CA 94305, USA\\ $^*$E-mail: mcmurdie@stanford.edu\\ https://github.com/joey711/phyloseq} \date{\today} \maketitle \tableofcontents \clearpage \section{Summary} The analysis of microbiological communities brings many challenges: the integration of many different types of data with methods from ecology, genetics, phylogenetics, network analysis, visualization and testing. The data itself may originate from widely different sources, such as the microbiomes of humans, soils, surface and ocean waters, wastewater treatment plants, industrial facilities, and so on; and as a result, these varied sample types may have very different forms and scales of related data that is extremely dependent upon the experiment and its question(s). The phyloseq package is a tool to import, store, analyze, and graphically display complex phylogenetic sequencing data that has already been clustered into \underline{O}perational \underline{T}axonomic \underline{U}nits (OTUs), especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs. This package leverages many of the tools available in R for ecology and phylogenetic analysis (vegan, ade4, ape, picante), while also using advanced/flexible graphic systems (ggplot2) to easily produce publication-quality graphics of complex phylogenetic data. phyloseq uses a specialized system of S4 classes to store all related phylogenetic sequencing data as single experiment-level object, making it easier to share data and reproduce analyses. In general, phyloseq seeks to facilitate the use of R for efficient interactive and reproducible analysis of OTU-clustered high-throughput phylogenetic sequencing data. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % About this vignette %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\clearpage \section{About this vignette} A separate vignette is included within the phyloseq-package that describes the basics of importing pre-clustered phylogenetic sequencing data, data filtering, as well as some transformations and some additional details about the package and installation. A quick way to load it is: <>= vignette("phyloseq_basics") @ By contrast, this vignette is intended to provide functional examples of the analysis tools and wrappers included in phyloseq. All necessary code for performing the analysis and producing graphics will be included with its description, and the focus will be on the use of example data that is included and documented within the phyloseq-package. Let's start by loading the \code{phyloseq-package}: <<>>= library("phyloseq") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Data}\label{sec:data} To facilitate testing and exploration of tools in phyloseq, this package includes example data from published studies. Many of the examples in this vignette use either the \code{GlobalPatterns} or \code{enterotype} datasets as source data. The \code{GlobalPatterns} data was described in an article in PNAS in 2011~\cite{Caporaso15032011}, and compares the microbial communities of 25 environmental samples and three known ``mock communities'' --- a total of 9 sample types --- at a depth averaging 3.1 million reads per sample. The \code{enterotype} dataset was described in a 2011 article in Nature~\cite{Arumugam:2011fk}, which compares, the faecal microbial communities from 22 subjects using complete shotgun DNA sequencing. The authors further compare these microbial communities with the faecal communities of subjects from other studies, for a total of 280 faecal samples / subjects, and 553 genera. Sourcing data from different studies invariable leads to gaps in the data for certain variables, and this is easily handled by \code{R}'s core \code{NA} features. Because this data is included in the package, the examples can easily be run on your own computer using the code shown in this vignette. The data is loaded into memory using the \code{data} command. Let's start by loading the \code{GlobalPatterns} data. <>= data(GlobalPatterns) @ Later on we will use an additional categorical designation --- human versus non-human associated samples --- that was not in the original dataset. Now is a good time to add it as an explicit variable of the \code{sampleData}, and because we don't want to type long words over and over, we'll choose a shorter name for this modified version of \code{GlobalPatterns}, call it \code{GP}, and also remove a handful of taxa that are not present in any of the samples included in this dataset (probably an artifact): <<>>= # prune OTUs that are not present in at least one sample GP <- prune_species(speciesSums(GlobalPatterns) > 0, GlobalPatterns) # Define a human-associated versus non-human categorical variable: human.levels <- levels( getVariable(GP, "SampleType") ) %in% c("Feces", "Mock", "Skin", "Tongue") human <- human.levels[getVariable(GP, "SampleType")] names(human) <- sample.names(GP) # Add new human variable to sample data: sampleData(GP)$human <- factor(human) @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Simple graphics %\clearpage \section{Simple exploratory graphics}\label{sec:explots} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Easy Richness Estimates} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% We can easily create a complex graphic that compares the richness estimates of samples from different environment types in the \code{GlobalPatterns} dataset, using the \code{plot{\_}richness{\_}estimates} function. Note that it is important to use raw (untrimmed) OTU-clustered data when performing richness estimates, as they are highly dependent on the number of singletons in a sample. <>= p <- plot_richness_estimates(GP, "human", "SampleType") (p <- p + geom_boxplot(data=p$data, aes(x=human, y=value, color=NULL), alpha=0.1)) @ <>= ggsave("phyloseq_analysis-richness_estimates.pdf", p, width=11, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=1.0\textwidth]{phyloseq_analysis-richness_estimates} \captionsetup{width=1.0\textwidth} \caption{Estimates of the species richness of samples in the ``Global Patterns'' dataset. Each panel shows a different type of ``estimate''. \code{S.obs}, observed richness; \code{S.chao1}, Chao1 estimate; \code{S.ACE}, an ACE estimate. Individual color-shaded points and brackets represent the richness estimate and the theoretical standard error range associated with that estimate, respectively. The colors group the sample-sources into ``types''. Within each panel, the samples are further organized into human-associated (\code{TRUE}) or not (\code{FALSE}), and a boxplot is overlayed on top of this for the two groups, illustrating that these human-associated samples are less rich than the environmental samples (although the type of environment appears to matter a great deal as well).} \label{fig:richness} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \subsection{Exploratory tree plots} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \emph{phyloseq} also contains a method for easily annotating a phylogenetic tree with information regarding the sample in which a particular taxa was observed, and optionally the number of individuals that were observed. For the sake of creating a readable tree, let's subset the data to just the \href{http://en.wikipedia.org/wiki/Chlamydiae}{Chlamydiae} phylum, which consists of obligate intracellular pathogens and is present in only a subset of environments in this dataset. <<>>= GP.chl <- subset_species(GP, Phylum=="Chlamydiae") @ And now we will create the tree graphic form this subset of \code{GlobalPatterns}, shading by the \code{"SampleType"} variable, which indicates the environment category from which the microbiome samples originated. The following command also takes the option of labelling the number of individuals observed in each sample (if at all) of each taxa. The symbols are slightly enlarged as the number of individuals increases. <>= plot_tree_phyloseq(GP.chl, color_factor="SampleType", type_abundance_value=TRUE, treeTitle="") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \subsection{Exploratory bar plots}\label{sec:barplots} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the following example we use the included ``enterotype'' dataset~\cite{Arumugam:2011fk}. <<>>= data(enterotype) @ We start with a simple rank-abundance barplot, using the cumulative fractional abundance of each OTU in the dataset. In the enterotype dataset, the available published data are simplified as sample-wise fractional occurrences, rather than counts of individuals\footnote{Unfortunate, as this means we lose information about the total number of reads and associated confidences, ability to do more sophisticated richness estimates, etc. For example, knowing that we observed 1 sequence read of a species out of 100 total reads means something very different from observing 10,000 reads out of 1,000,000 total.}, and OTUs are clustered/labeled at the genus level, but no other taxonomic assignment is available. For the barplot in Figure~\ref{fig:abundbarplot}, we further normalize by the total number of samples (\Sexpr{nsamples(enterotype)}). <>= par(mar = c(10, 4, 4, 2) + 0.1) # make more room on bottom margin N <- 30 barplot(sort(speciesSums(enterotype), TRUE)[1:N]/nsamples(enterotype), las=2) @ \begin{figure}[htbp] \centering \includegraphics[trim = 10mm 10mm 10mm 15mm, clip, width=0.6\textwidth]{phyloseq_analysis-EntAbundPlot} \caption{An example exploratory barplot using base \code{R} graphics and the \code{speciesSums} and \code{nsamples} functions.} \label{fig:abundbarplot} \end{figure} Note that this first barplot is clipped at the \Sexpr{N}th OTU. This was chosen because \code{nspecies(enterotype) =} \Sexpr{nspecies(enterotype)} OTUs would not be legible on the plot. As you can see, the relative abundances have decreased dramatically by the 10th-ranked OTU. So what are these OTUs? In the \code{enterotype} dataset, only a single taxonomic rank type is present: <<>>= rank.names(enterotype) @ This means the OTUs in this dataset have been grouped at the level of genera, and no other taxonomic grouping/transformation is possible without additional information (like might be present in a phylogenetic tree, or with further taxonomic classification analysis). We need to know which taxonomic rank classifiers, if any, we have available to specify in the second barplot function in this example, \code{plot{\_}taxa{\_}bar()}. We have already observed how quickly the abundance decreases with rank, so wo we will subset the enterotype dataset to the most abundant \code{N} taxa in order to make the barplot legible on this page. <<>>= TopNOTUs <- names(sort(speciesSums(enterotype), TRUE)[1:10]) ent10 <- prune_species(TopNOTUs, enterotype) print(ent10) @ Note also that there are \Sexpr{nsamples(ent10)} samples in this dataset, and so a remaining challenge is to consolidate these samples into meaningful groups. A good place to look is the available sample variables, which in most cases will carry more ``meaning'' than the sample names alone. <<>>= sample.variables(ent10) @ The parameters to \code{plot{\_}taxa{\_}bar} in the following code-chunk were chosen after various trials. We suggest that you also try different parameter settings while you're exploring different features of the data. In addition to the variables names of \code{sampleData}, the \code{plot{\_}taxa{\_}bar()} function recognizes a special parameter name \code{"TaxaGroup"}, which is not (should not be) a sample variable name in \mbox{\code{sampleData(enterotype)}}, but instead indicates that the particular graphic parameter should group values by the taxanomic rank specified in the \code{taxavec} argument. In this example we have also elected to separate the samples by ``facets'' (separate, adjacent sub-plots) according to the enterotype to which they have been assigned. Within each enterotype facet, the samples are further separated by sequencing technology, and the genera is indicated by fill color. Multiple samples having the same enterotype designation and sequencing technology are plotted side-by-side as separate bars. <>= (p <- plot_taxa_bar(ent10, "Genus", x="SeqTech", fill="TaxaGroup") + facet_wrap(~Enterotype) ) @ <>= ggsave("phyloseq_analysis-entbarplot.pdf", p, width=8, height=6) @ \begin{figure}[htbp] \centering \includegraphics[width=1.1\textwidth]{phyloseq_analysis-entbarplot} \captionsetup{width=1.0\textwidth} \caption{An example exploratory barplot using the \code{plot{\_}taxa{\_}bar()} function. In this case we have faceted the samples according to their assigned Enterotype. The small subset of samples in the dataset that do not have an assigned Enterotype are shown in the \code{NA} panel. Within each Enterotype facet, the samples are further separated by sequencing technology, and each genera is shaded a different color. Multiple samples from the same Enterotype and sequencing technology are plotted side-by-side as separate bars (dodged).} \label{fig:barplot} \end{figure} Figure~\ref{fig:barplot} summarizes quantitatively the increased abundances of \emph{Bacteroides} and \emph{Prevotella} in the Enterotypes 1 and 2, respectively. Interestingly, a large relative abundance of \emph{Blautia} was observed for Enterotype 3, but only from 454-pyrosequencing data sets, not the Illumina or Sanger datasets. This suggests the increased \emph{Blautia} might actually be an artifact. Similarly, \emph{Prevotella} appears to be one of the most abundant genera in the Illumina-sequenced samples among Enterotype 3, but this is not reproduced in the 454-pyrosequencing or Sanger sequencing data. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Exploratory Analysis, and more complex exploratory graphics. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \section{Exploratory analysis and graphics}\label{sec:analysisplots} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % makenetwork %\clearpage \subsection{Microbiome Network Representation}\label{sec:network} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Continuing with the \code{enterotype} dataset, here are some examples for creating a custom network representation of the relationship between microbiome samples in an experiment. This relies heavily on the \code{igraph} and \code{ggplot2} packages to create a network display of the ``connectedness'' of samples according to some user-provided ecological similarity. By default, the position of points (samples) are determined using an algorithm that optimizes the clarity of the display of network ``edges'', but the spatial position of points does not imply any continuous similarity information like would be the case in an ordination. In this example, the default dissimilarity index was used (Jaccard, co-occurrence), with a maximum distance of \code{0.3} required to create an edge. Any function that can operate on phyloseq-objects and return a sample-wise distance can be provided as the \code{dist.fun} argument, or a character string of the name of the distance function already supported in phyloseq. Other distances may result in very different clustering, and this is a choice that should be understood and not taken too lightly, although there is little harm in trying many different distances. <>= data(enterotype) ig <- make_sample_network(enterotype, max.dist=0.3) (p <- plot_sample_network(ig, enterotype, color="SeqTech", shape="Enterotype", line_weight=0.4, label=NULL)) @ <>= ggsave("phyloseq_analysis-plot_sample_network.pdf", p, width=7, height=7) @ Interestingly, at this level of analysis and parameter-settings the two major sub-graphs appear to be best explained by the sequencing technology and not the subject enterotype (Figure~\ref{fig:makenetwork}), suggesting that the choice of sequencing technology has a major effect on the microbial community one can observe. This seems to differ somewhat with the inferences described in the ``enterotype'' article~\cite{Arumugam:2011fk}. However, there could be some confounding or hidden variables that might also explain this phenomenon, and the well-known differences in the sequence totals between the technologies may also be an important factor. Furthermore, since this is clearly an experimental artifact (and they were including data from multiple studies that were not orginally planned for this purpose), it may be that the enterotype observation can also be shown in a network analysis of this data after removing the effect of sequencing technology and related sequencing effort. Such an effort would be interesting to show here, but is not yet included. \begin{figure}[htbp] \centering \includegraphics[width=0.7\textwidth]{phyloseq_analysis-plot_sample_network.pdf} \caption{Network representation of the relationship between microbiome samples in the ``Enterotype'' dataset~\cite{Arumugam:2011fk}.} \label{fig:makenetwork} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Ordination \clearpage \subsection{Ordination Methods}\label{sec:ordination} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Ordination methods can be a useful tool for exploring complex phylogenetic sequencing data, particularly when the hypothesized structure of the data is poorly defined (or there isn't a hypothesis). The phyloseq package provides some useful tools for performing ordinations and plotting their results, via the \code{ordinate()} and \code{plot{\_}ordination()} functions, respectively. Although there are many options and methods supported, a first-step will probably look something like the following: <>= my.physeq <- import("Biom", BIOMfilename="myBiomFile.biom") my.ord <- ordinate(my.physeq) plot_ordination(my.physeq, my.ord, color="myFavoriteVarible") @ It is probably a good idea to read the documentation for these two functions, as they also provide links to related functions and additional examples you can try immediately on your own machine. <>= ?import ?ordinate ?distance ?plot_ordination @ \subsubsection{Principal Coordinates Analysis (PCoA)}\label{sec:PCoA} We take as our first example, a reproduction of Figure 5 from the ``Global Patterns'' article~\cite{Caporaso15032011}. The authors show a 3-dimensional representation of the first three axes of a Principal Coordinates Analysis (PCoA\footnote{This is also sometimes referred to as ``Multi-Dimensional Scaling'', or ``MDS''}) performed on the unweighted-UniFrac distance (see section~\ref{sec:unifrac}) using all of the available sequences (their approach included both 5' and 3' sequences). According to the authors, ``the first axis [appears to be associated with a] host associated/free living [classification],'' and similarly the third axis with ``saline/nonsaline environment[s].'' The following reproduces the unweighted UniFrac distance calculation on the full dataset. Note that this calculation can take a long time because of the large number of OTUs. Parallelization is recommended for large datasets, typically if they are as large as \code{GlobalPatterns}, or larger. For details on parallelization, see the details section and examples in the \code{UniFrac()} documentation, and also the page dedicated to the topic on the phyloseq-wiki: \url{https://github.com/joey711/phyloseq/wiki/Fast-Parallel-UniFrac} <<>>= data(GlobalPatterns) @ <>= GPUF <- UniFrac(GlobalPatterns) @ <>= # Loads the pre-computed distance matrix, GPUF load("Unweighted_UniFrac.RData") @ <<>>= GloPa.pcoa <- pcoa(GPUF) @ Before we look at the results, let's first investigate how much of the total distance structure we will capture in the first few axes. We can do this graphically with a ``scree plot'', an ordered barplot of the relative fraction of the total eigenvalues associated with each axis (Fig.~\ref{fig:PCoAScree}). <>= barplot(GloPa.pcoa$values$Relative_eig) @ \begin{figure}[htbp] \centering \includegraphics[trim = 10mm 20mm 10mm 20mm, clip, width=0.6\textwidth]{phyloseq_analysis-PCoAScree} \captionsetup{width=0.6\textwidth} \caption{Scree plot of the PCoA used to create Figure 5 from the ``Global Patterns'' article~\cite{Caporaso15032011}. The first three axes represent \Sexpr{round(100*sum(GloPa.pcoa$values$Relative_eig[1:3]))}\% of the total variation in the distances. Interestingly, the fourth axis represents another \Sexpr{round(100*(GloPa.pcoa$values$Relative_eig[4]))}\%, and so may warrant exploration as well. A scree plot is an important tool for any ordination method, as the relative importance of axes can vary widely from one dataset to another.} \label{fig:PCoAScree} \end{figure} \clearpage Next, we will reproduce Figure 5 from the ``Global Patterns'' article~\cite{Caporaso15032011}, but separating the three axes into 2 plots using \code{plot{\_}ordination()} (Fig.~\ref{fig:GPfig5}). <>= (p12 <- plot_ordination(GlobalPatterns, GloPa.pcoa, "samples", color="SampleType") + geom_line() + geom_point(size=5) + scale_colour_hue(legend = FALSE) ) (p13 <- plot_ordination(GlobalPatterns, GloPa.pcoa, "samples", axes=c(1, 3), color="SampleType") + geom_line() + geom_point(size=5) ) @ <>= ggsave("phyloseq_analysis-GPfig5ax12.pdf", p12, width=7, height=7) ggsave("phyloseq_analysis-GPfig5ax13.pdf", p13, width=9, height=7) @ \begin{figure}[htbp] \centering \subfloat[Axes 1, 2]{\label{fig:sub:GPCA12}\includegraphics[width=0.45\textwidth]{phyloseq_analysis-GPfig5ax12}} \subfloat[Axes 1, 3]{\label{fig:sub:GPCA34}\includegraphics[width=0.55\textwidth]{phyloseq_analysis-GPfig5ax13}} %\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPfig5ax12} %\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPfig5ax13} \captionsetup{width=0.9\textwidth} \caption{A reproduction in \emph{phyloseq} / \code{R} of the main panel of Figure 5 from the ``Global Patterns'' article~\cite{Caporaso15032011}, on two plots. The horizontal axis represents the first axis in the PCoA ordination, while the top and bottom vertical axes represent the second and third axes, respectively. Different points represent different samples within the dataset, and are shaded according to the environment category to which they belong. The color scheme is the default used by \emph{ggplot}. } \label{fig:GPfig5} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % NMDS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\clearpage \subsubsection{non-metric Multi-Dimensional Scaling (NMDS)}\label{sec:NMDS} We repeat the previous example, but instead using non-metric multidimensional scaling (NMDS - \code{metaMDS()}) limited to just two dimensions. This approach limits the amount of residual distance ``not shown'' in the first two (or three) axes, but forefeits some mathematical properties and does not always converge within the specified number of axes. <>= # (Re)load UniFrac distance matrix and GlobalPatterns data data(GlobalPatterns) load("Unweighted_UniFrac.RData") # reloads GPUF variable GP.NMDS <- metaMDS(GPUF, k=2) # perform NMDS, set to 2 axes (p <- plot_ordination(GlobalPatterns, GP.NMDS, "samples", color="SampleType") + geom_line() + geom_point(size=5) ) @ <>= ggsave("phyloseq_analysis-GP_UF_NMDS.pdf", p, width=9, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=0.6\textwidth]{phyloseq_analysis-GP_UF_NMDS} \caption{An example exploratory ordination using non-metric multidimensional scaling (NMDS) on the unweighted UniFrac distance between samples of the ``Global Patterns'' dataset. Sample points are shaded by environment type, and connected by a line if they belong to the same type. Compare with Figure 5 from the ``Global Patterns'' article~\cite{Caporaso15032011}.} \label{fig:GP_UF_NMDS} \end{figure} Figure~\ref{fig:GP_UF_NMDS} nicely shows the relative dissimilarities between microbial communities from different habitats. However, it fails to indicate \emph{what} was different between the communities. For an ordination method that provides information on the taxa that explain differences between samples (or groups of samples), we use Correspondence Analysis (Section~\ref{sec:CA}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \subsubsection{Correspondence Analysis (CA)}\label{sec:CA} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the following section we will show continue our exploration of the ``GlobalPatterns'' dataset using various features of an ordination method called Correspondence Analysis. We give special emphasis to exploratory interpretations using the biplot, because it provides additional information that is not available from PCoA or NMDS. Let's start by performing a Correspondence Analysis and investigating the scree plot (Figure~\ref{fig:GPCAscree}). Both interestingly and challengingly, the scree plot suggests that the \code{GlobalPatterns} abundance data is quite high-dimensional, with the first two CA axes accounting for not quite 17\% of the total (chi-square) variability. Note the absence of a steep decline in eigenvalue fraction as axis number increases. Each additional axis represents only marginally less variability than the previous. It is often more convenient if the first two (or three) axes account for most of the variability. First, let's severely subset the number of species for the sake of run-time.\footnote{This is for illustration purposes only, do not repeat unless you are very sure you have a good reason for doing this.} <>= data(GlobalPatterns) # Take a subset of the GP dataset, top 200 species topsp <- names(sort(speciesSums(GlobalPatterns), TRUE)[1:200]) GP <- prune_species(topsp, GlobalPatterns) # Subset further to top 5 phyla, among the top 200 OTUs. top5ph <- sort(tapply(speciesSums(GP), taxTab(GP)[, "Phylum"], sum), decreasing=TRUE)[1:5] GP <- subset_species(GP, Phylum %in% names(top5ph)) # Re-add human variable to sample data: sampleData(GP)$human <- factor(human) @ Now perform the correspondence analysis. <>= # Now perform a unconstrained correspondence analysis gpca <- ordinate(GP, "CCA") barplot(gpca$CA$eig/sum(gpca$CA$eig), las=2) @ \begin{figure}[htbp] \centering \includegraphics[trim = 10mm 10mm 10mm 15mm, clip, width=0.7\textwidth]{phyloseq_analysis-GPCAscree} \caption{The correspondence analysis (CA) scree plot of the ``Global Patterns'' dataset~\cite{Caporaso15032011}.} \label{fig:GPCAscree} \end{figure} % \clearpage Now let's investigate how the samples behave on the first few CA axes. <>= (p12 <- plot_ordination(GP, gpca, "samples", color="SampleType") + geom_line() + geom_point(size=5) ) (p34 <- plot_ordination(GP, gpca, "samples", axes=c(3, 4), color="SampleType") + geom_line() + geom_point(size=5) ) @ <>= ggsave("phyloseq_analysis-GPCA12.pdf", p12, width=9, height=7) ggsave("phyloseq_analysis-GPCA34.pdf", p34, width=9, height=7) @ \begin{figure}[htbp] \centering \subfloat[Axes 1, 2]{\label{fig:sub:GPCA12}\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPCA12}} \subfloat[Axes 3, 4]{\label{fig:sub:GPCA34}\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPCA34}} \captionsetup{width=1.0\textwidth} \caption{First 4 axes of Correspondence Analysis (CA) of the ``Global Patterns'' dataset~\cite{Caporaso15032011}.} \label{fig:GPCA} \end{figure} A clear feature of these plots is that the feces and mock communities cluster tightly together, far away from all other samples on the first axis (CA1) in Fig.~\ref{fig:sub:GPCA12}. The skin and tongue samples separate similarly, but on the second axis. Taken together, it appears that the first two axes are best explained by the separation of human-associated ``environments'' from the other non-human environments in the dataset, with a secondary separation of tongue and skin samples from feces. We will now investigate further this top-level structure of the data, using an additional feature of correspondence analysis that allows us to compare the relative contributions of individual taxa on the same graphical space: the ``biplot''. However, because we just displayed the position of samples in the ordination and there are often many thousands of OTUs, we will focus on creating an interpretable plot of the OTUs. For creating graphics that combine the two plots, try the ``\code{biplot}'' or ``\code{split}'' option for \code{type} in \code{plot{\_}ordination()}. <>= p1 <- plot_ordination(GP, gpca, "species", color="Phylum") (p1 <- ggplot(p1$data, p1$mapping) + geom_point(size=5, alpha=0.5) + facet_wrap(~Phylum) + scale_colour_hue(legend = FALSE) ) @ <>= # Save as raster to control file size. ggsave("phyloseq_analysis-GPCAspecplot.pdf", p1, width=10, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=7in]{phyloseq_analysis-GPCAspecplot} \caption{Species plot of the ``Global Patterns'' correspondence analysis first two axes, with each phylum on a different panel (``facet''). Only the most abundant 5 phyla among the most abundant 200 taxa (cumulative, all samples) are included. Arbitrary reduction, for computational efficiency of example.} \label{fig:GPCAspecplot} \end{figure} Let's try drawing Fig~\ref{fig:GPCAspecplot} again, only this time summarizing the species points as a 2D density estimate, without any individual points. <>= (p3 <- ggplot(p1$data, p1$mapping) + geom_density2d() + facet_wrap(~Phylum) + scale_colour_hue(legend = FALSE) ) @ <>= # Do this one PDF, because it shouldn't be large file. ggsave("phyloseq_analysis-GPCAspecplotTopo.pdf", p3, width=10, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=7in]{phyloseq_analysis-GPCAspecplotTopo} \caption{Redrawn Fig.~\ref{fig:GPCAspecplot}, which is severely overplotted, as a 2-dimensional species-density topographic map, faceted in the same way.} \label{fig:GPCAspecplotTopo} \end{figure} Figs~\ref{fig:GPCAspecplot} and \ref{fig:GPCAspecplotTopo} reveal some useful patterns and interesting outliers, but what if we want a complete summary of how each phylum is represented along each axis? The following code is a way to show this using boxplots, while still avoiding the occlusion problem (points layered on top of each other), and also conveying some useful information about the pattern of taxa that contribute to the separation of human-associated samples from the other sample types. It re-uses the data that was stored in the \code{ggplot2} plot object created in the previous figure, \code{p}, and creates a new boxlot graphical summary of the positions of each phylum (Fig~\ref{fig:GPCAjitter}). <>= # Melt the species-data.frame, DF, to facet each CA axis separately mdf <- melt(p1$data[, c("CA1", "CA2", "Phylum", "Family", "Genus")], id=c("Phylum", "Family", "Genus") ) # Select some special outliers for labelling LF <- subset(mdf, variable=="CA2" & value < -1.0) # build plot: boxplot summaries of each CA-axis, with labels p <- ggplot(mdf, aes(Phylum, value, color=Phylum)) + geom_boxplot() + facet_wrap(~variable, 2) + scale_colour_hue(legend = FALSE) + theme_bw() + theme( axis.text.x = element_text(angle = -90, hjust = 0) ) # Add the text label layer, and render ggplot graphic (p <- p + geom_text(aes(Phylum, value+0.1, color=Phylum, label=Family), data=LF, vjust=0, size=2) ) @ <>= # Save as raster to control file size. #ggsave("phyloseq_analysis-GPCAjitter.png", p, width=6, height=6, dpi=75) ggsave("phyloseq_analysis-GPCAjitter.pdf", p, width=7, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=4.5in]{phyloseq_analysis-GPCAjitter} \caption{Boxplot of taxa (species in this case) of the ``Global Patterns'' CA first two axes, shaded/separated by phylum. Through this approach it is much easier to see particular species that cluster unusually relative to the rest of their phylum, for example the Bacteroidetes species (\emph{Prevotellaceae} family) that is positioned most in the negative CA2 direction toward the Tongue/Skin samples.} \label{fig:GPCAjitter} \end{figure} \clearpage One way to relate some of the high-level patterns we observed from correspondence analysis is to visualize the relative abundances of the relevant phylogenetic groups, to see if this does in fact support / explain the human/environment microbiome differences. Here is an example using the \code{plot{\_}taxa{\_}bar} function described earlier in Section~\ref{sec:barplots}. <>= (p <- plot_taxa_bar(GP, "Phylum", NULL, threshold=0.9, "human", "SampleType", facet_formula= TaxaGroup ~ .) ) @ <>= ggsave("phyloseq_analysis-GPtaxaplot.pdf", p, width=8, height=10) @ \begin{figure}[htbp] \centering \includegraphics[width=0.55\textwidth]{phyloseq_analysis-GPtaxaplot} \caption{Phylum-level comparison of relative abundance of taxa in samples that are from human microbiomes (or not).} \label{fig:GPtaxaplot} \end{figure} In Fig~\ref{fig:GPtaxaplot} we've used the \code{threshold} parameter to omit all but phyla accounting for the top 90\% of phyla in any one sample. Some patterns emerging from this display appear to be: (1) Cyanobacteria, Actinobacteria appear under-represented in human samples; (2) conversely, Firmicutes appear over-represented in human samples; (3) Acidobacteria, Verrucomicrobia appear over-represented in the fecal samples; (4) the only Crenarchaeota were observed in the Mock sample, which is not really a community but a simulated community used as a control. These are not hugely surprising based on previous biological observations from the field, but it is hopefully useful code that can be applied on other datasets that you might have. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DPCoA \clearpage \subsubsection{Double Principle Coordinate Analysis (DPCoA)} Here is a quick example illustrating the use of Double Principal Coordinate Analysis (DPCoA~\cite{Pavoine2004523}), using the using the \code{DPCoA()} function in phyloseq, as well as the ``biplot'' option for \code{plot{\_}ordination()}. For a description that includes an applied example using the ``enterotype'' dataset and comparison with UniFrac/PCoA, see Fukuyama et al~\cite{fukuyama2012com}. <>= GP.dpcoa <- DPCoA(GP) # GP.dpcoa <- ordinate(GP, "DPCoA") # Alternative; ordinate() function pdpcoa <- plot_ordination(GP, GP.dpcoa, type="biplot", color="SampleType", shape="Phylum") shape.fac <- pdpcoa$data[, deparse(pdpcoa$mapping$shape)] man.shapes <- c(19, 21:25) names(man.shapes) <- c("samples", levels(shape.fac)[levels(shape.fac)!="samples"]) p2dpcoa <- pdpcoa + scale_shape_manual(values=man.shapes) @ <>= ggsave("phyloseq_analysis-GPdpcoaBiplot.pdf", p2dpcoa, width=9, height=7) @ \begin{figure}[htbp] \centering \includegraphics[width=0.7\textwidth]{phyloseq_analysis-GPdpcoaBiplot} \caption{A biplot representation of a Double Principal Coordinate Analysis (DPCoA), on a simplified version of the ``Global Patterns'' dataset with only the most abundant 200 OTUs included.} \label{fig:GPdpcoaBiplot} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DISTANCE METHODS \clearpage \subsection{Distance Methods} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsubsection{\code{distance()}: Central Distance Function} Many comparisons of microbiome samples, including the graphical model (Section~\ref{sec:network}) and the PCoA analysis (Section~\ref{sec:PCoA}), require a calculation for the relative dissimilarity of one microbial community to another, or ``distance''. The phyloseq-package provides a general ``wrapper'' function for calculating ecological distance matrices between the samples in an experiment. \code{distance()} currently supports 43 method options, as well as user-provided arbitrary methods via an interface to vegan's \code{designdist()} function. Currrently only sample-wise distances are supported (the \code{type} argument), but eventually species-wise (OTU-wise) distances will be supported as well. In addition to supporting any of the method options to the three main distance functions of the \code{vegan}-package~\cite{veganpkg} --- including the 14 distances of the \code{vegdist()} function and all 24 ecological distances reviewed in Koleff et al. 2003~\cite{Koleff2003JAE} coded by \code{betadiver} --- \code{distance()} will eventually support many of the distance methods in the \code{ade4}-package~\cite{Dray:2007vs}, and others. It also supports the Fast \code{UniFrac} distance function (Section~\ref{sec:unifrac}) included in phyloseq as native \code{R} code, and a wrapper for retreiving the sample-distances from Double Principal Coordinate Analysis (DPCoA). The function takes a \code{phyloseq-class} object and an argument indicating the distance type; and it returns a \code{dist}-class distance matrix. <>= data(esophagus) distance(esophagus) # Unweighted UniFrac distance(esophagus, weighted=TRUE) # weighted UniFrac distance(esophagus, "jaccard") # vegdist jaccard distance(esophagus, "g") # betadiver method option "g" distance("help") @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % UniFrac %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\clearpage \subsubsection{UniFrac and weighted UniFrac}\label{sec:unifrac} UniFrac is a recently-defined~\cite{Lozupone:2005gn} and popular distance metric to summarize the difference between pairs of ecological communities. All UniFrac variants use a phylogenetic tree of the relationship among taxa as central information to calculating the distance between two samples/communities. An unweighted UniFrac distance matrix only considers the presence/absence of taxa, while weighted UniFrac accounts for the relative abundance of taxa as well as their phylogenetic distance. Prior to \emph{phyloseq}, a non-parallelized, non-Fast implementation of the unweighted UniFrac was available in \R{} packages (\code{picante::unifrac}~\cite{Kembel:2010ft}). In the \emph{phyloseq} package we provide optionally-parallelized implementations of Fast UniFrac~\cite{Hamady:2009fk} (both weighted and unweighted, with plans for additional UniFrac variants), all of which return a sample-wise distance matrix from any \code{phyloseq-class} object that contains a phylogenetic tree component. The following is an example calculating the UniFrac distance (both weighted and unweighted) matrix using the ``esophagus'' example dataset: <>= data(esophagus) UniFrac(esophagus, weighted=TRUE) # distance(esophagus, weighted=TRUE) # Alternative using the distance() function UniFrac(esophagus, weighted=FALSE) # distance(esophagus) # Alternative using the distance() function @ \begin{scriptsize} \begin{samepage} <>= round( UniFrac(esophagus, weighted=TRUE), 3) round( UniFrac(esophagus, weighted=FALSE), 3) @ \end{samepage} \end{scriptsize} See the wiki-page devoted to details about calculating the UniFrac distances for your experiment. In particular, some example run-times are provided for comparison, as well as details for initializing a parallel ``back end'' to perform the computation with multiple processor cores simultaneously: \url{https://github.com/joey711/phyloseq/wiki/Fast-Parallel-UniFrac} \subsection{Hierarchical Clustering} Another potentially useful and popular way to visualize/decompose sample-distance matrices is through hierarchical clustering (e.g. \code{hclust()}). In the following example, we reproduce Figure~4 from the ``Global Patterns'' article~\cite{Caporaso15032011}, using the unweighted UniFrac distance and the UPGMA method (\code{hclust} parameter \code{method="average"}). Try \code{help("hclust")} for alternative clustering methods included in standard \code{R}. <<>>= # (Re)load UniFrac distance matrix and GlobalPatterns data data(GlobalPatterns) load("Unweighted_UniFrac.RData") # reloads GPUF variable # Manually define color-shading vector based on sample type. colorScale <- rainbow(length(levels(getVariable(GlobalPatterns, "SampleType")))) cols <- colorScale[getVariable(GlobalPatterns, "SampleType")] GP.tip.labels <- as(getVariable(GlobalPatterns, "SampleType"), "character") GP.hclust <- hclust(GPUF, method="average") @ Plot the hierarchical clustering results as a dendrogram, after first converting the \code{hclust-class} object to \code{phylo-class} tree using the \code{as.phylo()} function. <>= plot(as.phylo(GP.hclust), show.tip.label=TRUE, tip.color="white") tiplabels(GP.tip.labels, col=cols, frame="none", adj=-0.05, cex=0.7) @ Create an alternative plot, using the Jaccard distance and complete-linkage clustering (the default) instead of UPGMA. <>= jaccCLC <- hclust(distance(GlobalPatterns, "jaccard")) plot( as.phylo(jaccCLC), show.tip.label=TRUE, tip.color="white" ) tiplabels(GP.tip.labels, col=cols, frame="none", adj=-0.05, cex=0.7) @ \begin{figure}[htbp] \centering \subfloat[UniFrac / UPGMA]{\label{fig:sub:GPfig4}\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPfig4}} \subfloat[Jaccard / CLC]{\label{fig:sub:GPjaccCLC}\includegraphics[width=0.5\textwidth]{phyloseq_analysis-GPfig4jaccCLC}} \captionsetup{width=1.0\textwidth} \caption{An alternative means of summarizing a distance matrix via hierarchical clustering and plotting as an annotated dendrogram. Compare with Figure 4 from the ``Global Patterns'' article~\cite{Caporaso15032011}. Panel~\ref{fig:sub:GPfig4} represents a faithful reproduction of the original approach from the article using \code{R} utilities, while Panel~\ref{fig:sub:GPjaccCLC} is an illustration of slightly different results with different choices of distance measure and clustering algorithm. Some differences in Panel~\ref{fig:sub:GPfig4} from the original article might be explained by the \code{GlobalPatterns} dataset in \emph{phyloseq} includes the summed observations from both directions (5' and 3'), while in the article they show the results separately. Furthermore, in the article the ``mock'' community is not included in the dataset, but an extra fecal sample is included. } \label{fig:GPfig4} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % VALIDATION \clearpage \section{Validation}\label{sec:validation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Multiple Testing. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Multiple Inference Correction}\label{sec:multtest} The \emph{phyloseq} package includes support for significance testing with correction for multiple inference. This is particularly important when testing for significance of the abundance patterns among thousands of microbes (OTUs). This is a common question of phylogenetic sequence data, that is, ``what is the subset of microbes that significantly correlate with a scientifically-interesting sample variable''. Although we plan to include support for other types of multiple-inference corrected tests, this ``which taxa?'' test is the only directly supported test at the moment. Our initial implementation of this support is via an extension to the \code{mt.maxT()} and \code{mt.minP} functions in the \emph{multtest} package~\cite{multtest} (Bioconductor repo). This uses permutation-adjusted p-values in a multiple testing procedure that provides strong control of the Family-Wise Error Rate (FWER) among the taxa being tested. The user specifies a sample-variable among the \code{sampleData} component, or alternatively provides a sample-wise vector or factor that classifies the samples into groups. Additional optional parameters can be provided that specify the type of test (``\code{test=}''), the sidedness of the test (``\code{side=}''), as well as some additional technical/computational parameters. In the following example we test whether a particular genera correlates with the Enterotype classification of each sample. Note that we have to specify an alternate test, \code{test="f"}, because the default test (t-test) can only handle up to 2 classes, and there are three enterotype classes. <>= data(enterotype) # Filter samples that don't have Enterotype classification. x <- subset_samples(enterotype, !is.na(Enterotype)) # Calculate the multiple-inference-adjusted P-values ent.p.table <- mt(x, "Enterotype", test="f") print(head(ent.p.table, 10)) @ \begin{table}[htbp] \centering \begin{tabular}{lrrrrr} \hline genera & index & teststat & rawp & adjp & plower \\ \hline Prevotella & 207 &344.73 & 0.0001 & 0.0158 & 0.0001 \\ Bacteroides & 203 & 85.01 & 0.0001 & 0.0158 & 0.0001 \\ Blautia & 187 & 19.52 & 0.0001 & 0.0158 & 0.0001 \\ Bryantella & 503 & 16.38 & 0.0001 & 0.0158 & 0.0001 \\ Parabacteroides & 205 & 12.89 & 0.0001 & 0.0158 & 0.0001 \\ Alistipes & 208 & 8.71 & 0.0002 & 0.0301 & 0.0158 \\ Bifidobacterium & 240 & 9.29 & 0.0004 & 0.0560 & 0.0430 \\ Holdemania & 201 & 7.64 & 0.0009 & 0.1146 & 0.1031 \\ Dorea & 182 & 7.44 & 0.0009 & 0.1146 & 0.1031 \\ Phascolarctobacterium & 513 & 7.01 & 0.0014 & 0.1695 & 0.1585 \\ \hline \end{tabular} \caption{For computational efficiency this calculation was run separately, and results embedded here.} \label{tab:entmt} \end{table} Not surprisingly, \emph{Prevotella} and \emph{Bacteroides} top the list, since they were major components of the ``Enterotype'' classification. Please also note that we are planning to incorporate other tools from \emph{multtest} that would allow for other types of multiple-inference correction procedures, for instance, strong control of the False Discovery Rate (FDR). These additional options will be made available shortly. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \section{Further Examples} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This vignette is limited in size and scope because of constraints on the cumulative-size of packages allowed in Bioconductor. \subsection{phyloseq Wiki} For further examples presented in html/wiki format, please see: \url{https://github.com/joey711/phyloseq/wiki/Graphics-Examples} \subsection{phyloseq Vignette Gallery} For additional and/or updated vignettes not present in the official Bioconductor release, please see: \url{https://github.com/joey711/phyloseq/wiki/Vignettes} \subsection{phyloseq Feedback} For feature requests, bug reports, and other suggestions and issues, please go to: \url{https://github.com/joey711/phyloseq/issues} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Appendices %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\clearpage %\appendix %\section{Details} %\section{Caveats} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Bibliography. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\clearpage %\bibliographystyle{ws-procs11x85} \bibliography{phyloseq_analysis} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % In case you need to hide results for some reason %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % <>= % # code( with results that need hiding ) % @ % \end{document}