1. Anatomy of a MPSE

MicrobiotaProcess introduces MPSE S4 class. This class inherits the SummarizedExperiment(Morgan et al. 2021) class. Here, the assays slot is used to store the rectangular abundance matrices of features for a microbiome experimental results. The colData slot is used to store the meta-data of sample and some results about samples in the downstream analysis. The rowData is used to store the meta-data of features and some results about the features in the downstream analysis. Compared to the SummarizedExperiment object, MPSE introduces the following additional slots:

The structure of the MPSE class.

The structure of the MPSE class.

2. Overview of the design of MicrobiotaProcess package

With this data structure, MicrobiotaProcess will be more interoperable with the existing computing ecosystem. For example, the slots inherited SummarizedExperiment can be extracted via the methods provided by SummarizedExperiment. The taxatree and otutree can also be extracted via mp_extract_tree, and they are compatible with ggtree(Yu et al. 2017), ggtreeExtra(Xu et al. 2021), treeio(Wang et al. 2020) and tidytree(Yu 2021) ecosystem since they are all treedata class, which is a data structure used directly by these packages.

Moreover, the results of upstream analysis of microbiome based some tools, such as qiime2(Bolyen et al. 2019), dada2(Callahan et al. 2016) and MetaPhlAn(Beghini et al. 2021) or other classes (SummarizedExperiment(Morgan et al. 2021), phyloseq(McMurdie and Holmes 2013) and TreeSummarizedExperiment(Huang et al. 2021)) used to store the result of microbiome can be loaded or transformed to the MPSE class.

In addition, MicrobiotaProcess also introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome analysis procedures under a unified and common framework (tidy-like framework). We believe MicrobiotaProcess can improve the efficiency of related researches, and it also bridges microbiome data analysis with the tidyverse(Wickham et al. 2019).

The Overview of the design of MicrobiotaProcess package

The Overview of the design of MicrobiotaProcess package

3. MicrobiotaProcess profiling

3.1 bridges other tools

MicrobiotaProcess provides several functions to parsing the output of upstream analysis tools of microbiome, such as qiime2(Bolyen et al. 2019), dada2(Callahan et al. 2016) and MetaPhlAn(Beghini et al. 2021), and return MPSE object. Some bioconductor class, such as phyloseq(McMurdie and Holmes 2013), TreeSummarizedExperiment(Huang et al. 2021) and SummarizedExperiment(Morgan et al. 2021) can also be converted to MPSE via as.MPSE().

## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 11
## # OTU=232 | Samples=19 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
##    <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
##  6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
##  9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## 10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## # … with 4,398 more rows
## # A MPSE-tibble (MPSE object) abstraction: 12,006 × 32
## # OTU=138 | Samples=87 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Speies
##    OTU    Sample Abundance Sample_Name_s BarcodeSequence LinkerPrimerSeq… Subject
##    <chr>  <chr>      <dbl> <chr>         <chr>           <chr>            <chr>  
##  1 OTU_1  ERR13…       901 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  2 OTU_2  ERR13…       877 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  3 OTU_3  ERR13…       239 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  4 OTU_4  ERR13…       201 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  5 OTU_5  ERR13…       168 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  6 OTU_6  ERR13…       115 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  7 OTU_7  ERR13…       107 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  8 OTU_8  ERR13…        84 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
##  9 OTU_9  ERR13…        67 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
## 10 OTU_10 ERR13…        67 LR53          AGTGTCGATTCG    TATGGTAATTGT     Patient
## # … with 11,996 more rows, and 25 more variables: Sex <chr>, Age <int>, Pittsburgh <chr>, Bell <dbl>,
## #   BMI <dbl>, sCD14ugml <dbl>, LBPugml <dbl>, LPSpgml <dbl>, IFABPpgml <dbl>,
## #   Physical_functioning <dbl>, Role_physical <dbl>, Role_emotional <dbl>,
## #   Energy_fatigue <dbl>, Emotional_well_being <dbl>, Social_functioning <dbl>,
## #   Pain <dbl>, General_health <dbl>, Description <lgl>, Kingdom <chr>,
## #   Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>,
## #   Speies <chr>
## # A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
## # OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
##    OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
##    <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
##  1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
##  2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
##  5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
##  9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
## 10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
## # … with 5,250 more rows
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 99 taxa and 12 samples ]
## sample_data() Sample Data:       [ 12 samples by 2 sample variables ]
## tax_table()   Taxonomy Table:    [ 99 taxa by 7 taxonomic ranks ]
## # A MPSE-tibble (MPSE object) abstraction: 1,188 × 12
## # OTU=99 | Samples=12 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance sample group Kingdom  Phylum Class Order Family Genus
##    <chr>  <chr>      <int> <fct>  <fct> <chr>    <chr>  <chr> <chr> <chr>  <chr>
##  1 OTU_1  B11         3995 B11    B     k__Bact… p__Fi… c__B… o__L… f__La… g__L…
##  2 OTU_10 B11          605 B11    B     k__Bact… p__Fi… c__B… o__L… f__La… g__L…
##  3 OTU_1… B11           57 B11    B     k__Bact… p__Fi… c__C… o__C… f__Ru… g__R…
##  4 OTU_1… B11            0 B11    B     k__Bact… p__Fi… c__C… o__C… f__Ru… g__I…
##  5 OTU_1… B11            5 B11    B     k__Bact… p__Fi… c__E… o__E… f__Er… g__E…
##  6 OTU_1… B11            1 B11    B     k__Bact… p__Fi… c__C… o__C… f__Ru… g__R…
##  7 OTU_1… B11            0 B11    B     k__Bact… p__Ba… c__B… o__B… f__Pr… g__P…
##  8 OTU_1… B11            0 B11    B     k__Bact… p__Fi… c__C… o__C… f__Ru… g__C…
##  9 OTU_1… B11            0 B11    B     k__Bact… p__Ba… c__B… o__S… f__Sp… g__S…
## 10 OTU_1… B11           34 B11    B     k__Bact… p__Fi… c__C… o__C… f__Ru… g__R…
## # … with 1,178 more rows, and 1 more variable: Species <chr>

3.2 alpha diversity analysis

Rarefaction, based on sampling technique, was used to compensate for the effect of sample size on the number of units observed in a sample(Siegel 2004). MicrobiotaProcess provided mp_cal_rarecurve and mp_plot_rarecurve to calculate and plot the curves based on rrarefy of vegan(Oksanen et al. 2019).

## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 11
## # OTU=232 | Samples=19 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
##    <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
##  6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
##  8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
##  9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## 10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
## # … with 4,398 more rows
## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 13
## # OTU=232 | Samples=19 | Assays=Abundance, RareAbundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbundance time  RareAbundanceRarecurve
##    <chr>  <chr>      <int>         <int> <chr> <list>                
##  1 OTU_1  F3D0         579           214 Early <tibble [2,520 × 4]>  
##  2 OTU_2  F3D0         345           116 Early <tibble [2,520 × 4]>  
##  3 OTU_3  F3D0         449           179 Early <tibble [2,520 × 4]>  
##  4 OTU_4  F3D0         430           167 Early <tibble [2,520 × 4]>  
##  5 OTU_5  F3D0         154            54 Early <tibble [2,520 × 4]>  
##  6 OTU_6  F3D0         470           174 Early <tibble [2,520 × 4]>  
##  7 OTU_7  F3D0         282           115 Early <tibble [2,520 × 4]>  
##  8 OTU_8  F3D0         184            74 Early <tibble [2,520 × 4]>  
##  9 OTU_9  F3D0          45            16 Early <tibble [2,520 × 4]>  
## 10 OTU_10 F3D0         158            59 Early <tibble [2,520 × 4]>  
##    Kingdom     Phylum           Class          Order           
##    <chr>       <chr>            <chr>          <chr>           
##  1 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  2 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  3 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  4 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  5 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  6 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  7 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  8 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##  9 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
## 10 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales
##    Family            Genus                   Species                
##    <chr>             <chr>                   <chr>                  
##  1 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  2 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  3 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  4 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  5 f__Bacteroidaceae g__Bacteroides          s__un_g__Bacteroides   
##  6 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  7 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
##  8 f__Rikenellaceae  g__Alistipes            s__un_g__Alistipes     
##  9 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
## 10 f__Muribaculaceae g__un_f__Muribaculaceae s__un_f__Muribaculaceae
## # … with 4,398 more rows
The rarefaction of samples or groups

The rarefaction of samples or groups

3.3 calculate alpha index and visualization

Alpha diversity can be estimated the species richness and evenness of some species communities. MicrobiotaProcess provides mp_cal_alpha to calculate alpha index (Observe, Chao1, ACE, Shannon, Simpson and J (Pielou’s evenness)) and the mp_plot_alpha to visualize the result.

## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 19
## # OTU=232 | Samples=19 | Assays=Abundance, RareAbundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbundance time  RareAbundanceRarec… Observe Chao1
##    <chr>  <chr>      <int>         <int> <chr> <list>                <dbl> <dbl>
##  1 OTU_1  F3D0         579           214 Early <tibble [2,520 × 4…     104  104.
##  2 OTU_2  F3D0         345           116 Early <tibble [2,520 × 4…     104  104.
##  3 OTU_3  F3D0         449           179 Early <tibble [2,520 × 4…     104  104.
##  4 OTU_4  F3D0         430           167 Early <tibble [2,520 × 4…     104  104.
##  5 OTU_5  F3D0         154            54 Early <tibble [2,520 × 4…     104  104.
##  6 OTU_6  F3D0         470           174 Early <tibble [2,520 × 4…     104  104.
##  7 OTU_7  F3D0         282           115 Early <tibble [2,520 × 4…     104  104.
##  8 OTU_8  F3D0         184            74 Early <tibble [2,520 × 4…     104  104.
##  9 OTU_9  F3D0          45            16 Early <tibble [2,520 × 4…     104  104.
## 10 OTU_10 F3D0         158            59 Early <tibble [2,520 × 4…     104  104.
## # … with 4,398 more rows, and 11 more variables: ACE <dbl>, Shannon <dbl>, Simpson <dbl>, J <dbl>,
## #   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>,
## #   Genus <chr>, Species <chr>
The alpha diversity comparison

The alpha diversity comparison

Users can extract the result with mp_extract_sample() to extract the result of mp_cal_alpha and visualized the result manually, see the example of mp_cal_alpha.

3.4 The visualization of taxonomy abundance

MicrobiotaProcess provides the mp_cal_abundance, mp_plot_abundance to calculate and plot the composition of species communities. And the mp_extract_abundance can extract the abundance of specific taxonomy level. User can also extract the abundance table to perform external analysis such as visualize manually (see the example of mp_cal_abundance).

## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 19
## # OTU=232 | Samples=19 | Assays=Abundance, RareAbundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbundance time  RareAbundanceRarec… Observe Chao1
##    <chr>  <chr>      <int>         <int> <chr> <list>                <dbl> <dbl>
##  1 OTU_1  F3D0         579           214 Early <tibble [2,520 × 4…     104  104.
##  2 OTU_2  F3D0         345           116 Early <tibble [2,520 × 4…     104  104.
##  3 OTU_3  F3D0         449           179 Early <tibble [2,520 × 4…     104  104.
##  4 OTU_4  F3D0         430           167 Early <tibble [2,520 × 4…     104  104.
##  5 OTU_5  F3D0         154            54 Early <tibble [2,520 × 4…     104  104.
##  6 OTU_6  F3D0         470           174 Early <tibble [2,520 × 4…     104  104.
##  7 OTU_7  F3D0         282           115 Early <tibble [2,520 × 4…     104  104.
##  8 OTU_8  F3D0         184            74 Early <tibble [2,520 × 4…     104  104.
##  9 OTU_9  F3D0          45            16 Early <tibble [2,520 × 4…     104  104.
## 10 OTU_10 F3D0         158            59 Early <tibble [2,520 × 4…     104  104.
## # … with 4,398 more rows, and 11 more variables: ACE <dbl>, Shannon <dbl>, Simpson <dbl>, J <dbl>,
## #   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>,
## #   Genus <chr>, Species <chr>
## # A MPSE-tibble (MPSE object) abstraction: 4,408 × 20
## # OTU=232 | Samples=19 | Assays=Abundance, RareAbundance, RelRareAbundanceBySample | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
##    OTU    Sample Abundance RareAbundance RelRareAbundanceBySample time  RareAbundanceRa…
##    <chr>  <chr>      <int>         <int>                    <dbl> <chr> <list>          
##  1 OTU_1  F3D0         579           214                    8.50  Early <tibble [2,520 …
##  2 OTU_2  F3D0         345           116                    4.61  Early <tibble [2,520 …
##  3 OTU_3  F3D0         449           179                    7.11  Early <tibble [2,520 …
##  4 OTU_4  F3D0         430           167                    6.63  Early <tibble [2,520 …
##  5 OTU_5  F3D0         154            54                    2.14  Early <tibble [2,520 …
##  6 OTU_6  F3D0         470           174                    6.91  Early <tibble [2,520 …
##  7 OTU_7  F3D0         282           115                    4.57  Early <tibble [2,520 …
##  8 OTU_8  F3D0         184            74                    2.94  Early <tibble [2,520 …
##  9 OTU_9  F3D0          45            16                    0.635 Early <tibble [2,520 …
## 10 OTU_10 F3D0         158            59                    2.34  Early <tibble [2,520 …
## # … with 4,398 more rows, and 13 more variables: Observe <dbl>, Chao1 <dbl>, ACE <dbl>,
## #   Shannon <dbl>, Simpson <dbl>, J <dbl>, Kingdom <chr>, Phylum <chr>,
## #   Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>