Chapter 10 Differential Analysis

Identifying the significant taxa between case and control group is necessary. There are too many differential analysis (DA) approaches to choose (Nearing et al. 2022), we provide some of them which focusing on microbial data, including:

Table 10.1: Differential analysis tools in XMAS.
Tool.version. Input Normalization Transformation Distribution MicrobialData
ALDEx2 (1.26.0) Counts None CLR Dirichlet-multinormial 16s
limma voom (3.50.1) Counts/Relative None/TMM Log; Precision weighting Normal 16s/MGS
mbzinb (0.2) Counts RLE None Zero-inflated negative binomial 16s
omnibus (0.2) Counts GMPR(Geometric Mean of Pairwise Ratios) None Zero-inflated negative binomial 16s
RAIDA (1.0) Counts None zero-inflated Log Modified t-test 16s
Wilcox(rare/CLR) Counts/Relative None None/CLR Non-parametric 16s/MGS
LEfSe Rarefied Counts/Relative TSS None Non-parametric 16s/MGS
t-test (rare) Counts/Relative None None Normal 16s/MGS
metagenomeSeq (1.36.0) Counts CSS Log Zero-inflated (log) Normal 16s
DESeq2 (1.34.0) Counts RLE None Negative binomial 16s
edgeR (3.36.0) Counts RLE/TMM None Negative binomial 16s
ANCOM-II (2.1) Counts/Relative None ALR Non-parametric 16s/MGS
Corncob (0.2.0) Counts None None Beta-binomial 16s
MaAslin2 (1.8.0) Counts/Relative None/TSS AST Normal 16s/MGS

Outline of this Chapter:

10.1 Loading Packages

library(XMAS2)
library(dplyr)
library(tibble)
library(phyloseq)
library(ggplot2)
library(ggpubr)

10.2 Importing Data

10.2.1 16s data

data("dada2_ps")

# step1: Removing samples of specific group in phyloseq-class object
dada2_ps_remove_BRS <- get_GroupPhyloseq(
                     ps = dada2_ps,
                     group = "Group",
                     group_names = "QC",
                     discard = TRUE)

# step2: Rarefying counts in phyloseq-class object
dada2_ps_rarefy <- norm_rarefy(object = dada2_ps_remove_BRS, 
                               size = 51181)

# step3: Extracting specific taxa phyloseq-class object 
dada2_ps_rare_genus <- summarize_taxa(ps = dada2_ps_rarefy, 
                                      taxa_level = "Genus", 
                                      absolute = TRUE)

# step4: Aggregating low relative abundance or unclassified taxa into others
# dada2_ps_genus_LRA <- summarize_LowAbundance_taxa(ps = dada2_ps_rare_genus, 
#                                                   cutoff = 10, 
#                                                   unclass = TRUE)

# step4: Filtering the low relative abundance or unclassified taxa by the threshold
dada2_ps_genus_filter <- run_filter(ps = dada2_ps_rare_genus, 
                                    cutoff = 10, 
                                    unclass = TRUE)

# step5: Trimming the taxa with low occurrence less than threshold
dada2_ps_genus_filter_trim <- run_trim(object = dada2_ps_genus_filter, 
                                       cutoff = 0.2, 
                                       trim = "feature")
dada2_ps_genus_filter_trim
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 100 taxa and 23 samples ]
## sample_data() Sample Data:       [ 23 samples by 1 sample variables ]
## tax_table()   Taxonomy Table:    [ 100 taxa by 6 taxonomic ranks ]

10.2.2 Metagenomic data

data("metaphlan2_ps")

# step1: Removing samples of specific group in phyloseq-class object
metaphlan2_ps_remove_BRS <- get_GroupPhyloseq(
                     ps = metaphlan2_ps,
                     group = "Group",
                     group_names = "QC",
                     discard = TRUE)

# step2: Extracting specific taxa phyloseq-class object 
metaphlan2_ps_genus <- summarize_taxa(ps = metaphlan2_ps_remove_BRS, 
                                      taxa_level = "Genus")

# step3: Aggregating low relative abundance or unclassified taxa into others
# dada2_ps_genus_LRA <- summarize_LowAbundance_taxa(ps = dada2_ps_genus, 
#                                                   cutoff = 10, 
#                                                   unclass = TRUE)

# step4: Filtering the low relative abundance or unclassified taxa by the threshold
metaphlan2_ps_genus_filter <- run_filter(ps = metaphlan2_ps_genus, 
                                         cutoff = 1e-4, 
                                         unclass = TRUE)

# step5: Trimming the taxa with low occurrence less than threshold
metaphlan2_ps_genus_filter_trim <- run_trim(object = metaphlan2_ps_genus_filter, 
                                            cutoff = 0.2, 
                                            trim = "feature")
metaphlan2_ps_genus_filter_trim
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 54 taxa and 22 samples ]
## sample_data() Sample Data:       [ 22 samples by 2 sample variables ]
## tax_table()   Taxonomy Table:    [ 54 taxa by 6 taxonomic ranks ]

10.3 Amplicon sequencing microbial data (16s)

10.3.1 ALDEx2

ALDEx2 package is from Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis (Fernandes et al. 2014), and its principle is using log-ratio transformation and statistical testing to find the significant Taxa. (Caution: the otu_table must be integers).

run_aldex provides 11 parameters. For instance, norm and transform are used to normalization and transformation input data. More details to see help(run_aldex).

DA_ALDEx2 <- run_aldex(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                method = "t.test")
## |------------(25%)----------(50%)----------(75%)----------|
colnames(DA_ALDEx2)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Pvalue"                            "AdjustedPvalue"                   
##  [7] "Median CLR \n(All)"                "Median CLR\nAA"                    "Median CLR\nBB"                   
## [10] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [13] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [16] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [19] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_ALDEx2)
##               TaxaID         Block Enrichment  EffectSize    Pvalue AdjustedPvalue Median CLR \n(All) Median CLR\nAA
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif  0.03216861 0.6548749      0.9018534         -3.4244631      -3.555905
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif  0.09330854 0.6658567      0.9138848          2.7645196       2.167638
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif  0.30832148 0.3184500      0.7716171         -0.7734251      -2.223805
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif -0.08089118 0.6408318      0.9042170          5.4463483       6.081374
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif -0.15579201 0.5972170      0.8723074         -2.9580004      -2.541838
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif -0.09579649 0.6356202      0.9052481          3.7226931       4.768802
##   Median CLR\nBB Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1     -3.3102684                                NA                       0                    0                  0.0
## 2      3.1218379                        -0.9577718                      30                   26                 50.5
## 3      0.3041529                                NA                       0                    0                  9.0
## 4      5.3302183                         0.1524904                     316                  329                296.0
## 5     -3.2004909                                NA                       0                    0                  0.0
## 6      3.4586946                        -0.1085245                      64                   64                 69.0
##   Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All)
## 1                       1.7051442              58.82609         101.777778           31.21429                    21.74
## 2                      -1.3107676              50.91304          26.777778           66.42857                    82.61
## 3                      -2.4683647              19.21739           5.111111           28.28571                    43.48
## 4                       1.7488297             996.26087        1740.444444          517.85714                    65.22
## 5                      -2.1676332              93.78261          30.000000          134.78571                    21.74
## 6                       0.2000385             163.86957         177.888889          154.85714                    73.91
##   Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                 88.89                 78.57       4.1 (6.8;1.3)
## 3                 33.33                 50.00        8.6 (13;4.3)
## 4                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                 22.22                 21.43      1.5 (2.3;0.71)
## 6                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by ALDEx2;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue by ALDEx2;

  6. Median CLR (All)/(group AA)/(group BB): Median CLR (normalization by ALDEx2) in all, group AA and group BB, respectively;

  7. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  8. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  9. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  10. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  11. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  12. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

Please open the below buttons, if you want to see other options for differential analysis in ALDEx2.

run_da() pattern

We also provide another function run_da to run ALDEx2 differential analysis.

DA_ALDEx2 <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "aldex",
                method = "t.test")
## |------------(25%)----------(50%)----------(75%)----------|
colnames(DA_ALDEx2)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Pvalue"                            "AdjustedPvalue"                   
##  [7] "Median CLR \n(All)"                "Median CLR\nAA"                    "Median CLR\nBB"                   
## [10] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [13] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [16] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [19] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
otu_table or sample_table as inputdata

We also provide data_otu and data_sam as input data to run run_aldex.

DA_ALDEx2 <- run_aldex(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"),
                method = "t.test")
## |------------(25%)----------(50%)----------(75%)----------|
colnames(DA_ALDEx2)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Pvalue"                            "AdjustedPvalue"                   
##  [7] "Median CLR \n(All)"                "Median CLR\nAA"                    "Median CLR\nBB"                   
## [10] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [13] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [16] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [19] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
taxa_level option

We also provide taxa_level for choosing the specific taxonomic level to run run_aldex.

DA_ALDEx2 <- run_aldex(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"),
                method = "t.test")
## |------------(25%)----------(50%)----------(75%)----------|
colnames(DA_ALDEx2)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Pvalue"                            "AdjustedPvalue"                   
##  [7] "Median CLR \n(All)"                "Median CLR\nAA"                    "Median CLR\nBB"                   
## [10] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [13] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [16] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [19] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"

10.3.2 limma_voom

limma package is from voom: Precision weights unlock linear model analysis tools for RNA-seq read counts (Law et al. 2014). Firstly, transforming count data to log2-counts per million (logCPM), estimate the mean-variance relationship and use this to compute appropriate observation-level weights. Secondly, fitting multiple linear models by weighted or generalized least squares. Finally, performing empirical bayes statistics for differential expression.

run_limma_voom provides 11 parameters. For instance, norm and transform are used to normalization and transformation input data. More details to see help(run_limma_voom).

DA_limma_voom <- run_limma_voom(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"))

colnames(DA_limma_voom)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "logFC"                             "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_limma_voom)
##               TaxaID         Block Enrichment  EffectSize       logFC    Pvalue AdjustedPvalue
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif  0.09424356  0.09424356 0.9451632      0.9895209
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif  0.42751966  0.42751966 0.7781674      0.9489846
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif  1.76792390  1.76792390 0.1726319      0.7804151
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif  0.27153770  0.27153770 0.8944583      0.9722372
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif -0.24224151 -0.24224151 0.8457202      0.9610457
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif -0.41145677 -0.41145677 0.8141009      0.9491533
##   Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1                                NA                       0                    0                  0.0
## 2                        -0.9577718                      30                   26                 50.5
## 3                                NA                       0                    0                  9.0
## 4                         0.1524904                     316                  329                296.0
## 5                                NA                       0                    0                  0.0
## 6                        -0.1085245                      64                   64                 69.0
##   Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All)
## 1                       1.7051442              58.82609         101.777778           31.21429                    21.74
## 2                      -1.3107676              50.91304          26.777778           66.42857                    82.61
## 3                      -2.4683647              19.21739           5.111111           28.28571                    43.48
## 4                       1.7488297             996.26087        1740.444444          517.85714                    65.22
## 5                      -2.1676332              93.78261          30.000000          134.78571                    21.74
## 6                       0.2000385             163.86957         177.888889          154.85714                    73.91
##   Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                 88.89                 78.57       4.1 (6.8;1.3)
## 3                 33.33                 50.00        8.6 (13;4.3)
## 4                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                 22.22                 21.43      1.5 (2.3;0.71)
## 6                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by limma;

  5. logFC: LogFC from groups’ coefficient by limma;

  6. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue by limma;

  7. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  8. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  9. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  10. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  11. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  12. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

Please open the below buttons, if you want to see other options for differential analysis in limma.

other options for limma-voom
if(0) {
DA_limma_voom <- run_limma_voom(
                    data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                    data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                    group = "Group",
                    group_names = c("AA", "BB"))

DA_limma_voom <- run_limma_voom(
                    ps = dada2_ps,
                    taxa_level = "Genus",
                    group = "Group",
                    group_names = c("AA", "BB"))

DA_limma_voom <- run_da(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"))

head(DA_limma_voom)
}

10.3.3 mbzinb

mbzinb package is from An omnibus test for differential distribution analysis of microbiome sequencing data (J. Chen et al. 2018). It uses zeroinflated negative binomial model to investigate the significant taxa. (Caution: the otu_table must be integers).

DA_mbzinb <- run_mbzinb(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))
colnames(DA_mbzinb)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "Pvalue"                            "AdjustedPvalue"                    "EffectSize"                       
##  [7] "base.mean"                         "mean.LFC"                          "base.abund"                       
## [10] "abund.LFC"                         "base.prev"                         "prev.change"                      
## [13] "base.disp"                         "disp.LFC"                          "statistic"                        
## [16] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [19] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [22] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [25] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_mbzinb)
##                           TaxaID         Block Enrichment      Pvalue AdjustedPvalue   EffectSize   base.mean   mean.LFC
## 1               g__Adlercreutzia 9_AA vs 14_BB  Nonsignif 0.032781937      0.2341567   23.1746032    4.161223  2.6824340
## 2              g__Butyricicoccus 9_AA vs 14_BB  Nonsignif 0.019009756      0.1564333   61.0000000  231.439059 -0.1466880
## 3 g__Clostridium_sensu_stricto_1 9_AA vs 14_BB  Nonsignif 0.007406534      0.1564333  399.5714286   10.618571  4.5552518
## 4                g__Enterococcus 9_AA vs 14_BB  Nonsignif 0.020336331      0.1564333   60.7460317   32.658840  1.5654364
## 5            g__Faecalibacterium 9_AA vs 14_BB  Nonsignif 0.068916032      0.3828668 2742.6031746 3229.238369 -1.1278995
## 6               g__Gordonibacter 9_AA vs 14_BB  Nonsignif 0.017121597      0.1564333    0.3888889    7.074759  0.1774837
##    base.abund  abund.LFC base.prev prev.change  base.disp  disp.LFC statistic Log2FoldChange (Median)\nAA_vs_BB
## 1   12.483819  2.0941415 0.3333293  0.16782006 0.05915642  3.206884  8.751636                                NA
## 2  297.532163  0.1283709 0.7778623 -0.13502400 0.04079382  3.619904  9.948415                          2.174498
## 3   46.181531  2.4345380 0.2299311  0.77006016 1.20536377  1.475574 11.993187                                NA
## 4   97.952015  0.4498616 0.3334167  0.38903504 0.04624961  4.624979  9.800932                                NA
## 5 4844.268667 -1.2274983 0.6666101  0.04764622 0.22640364  1.505613  7.095478                          2.400169
## 6    9.837094  0.9244114 0.7191920 -0.29064622 0.96279691 -5.167700 10.176792                                NA
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1                       0                    0                  9.0                     -2.46836474             19.217391
## 2                     282                  316                 70.0                      0.36332246            236.869565
## 3                      11                    0                 39.5                     -4.88463779            257.217391
## 4                      36                    0                 51.5                     -1.42341563             73.086957
## 5                    1373                 4537                859.5                      1.43691832           2679.478261
## 6                       2                    2                  0.0                     -0.06030051              9.347826
##   Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1           5.111111           28.28571                    43.48                 33.33                 50.00
## 2         274.000000          213.00000                    69.57                 77.78                 64.29
## 3          14.000000          413.57143                    60.87                 22.22                 85.71
## 4          36.111111           96.85714                    56.52                 33.33                 71.43
## 5        4348.888889         1606.28571                    69.57                 66.67                 71.43
## 6           9.111111            9.50000                    52.17                 66.67                 42.86
##   Odds Ratio (95% CI)
## 1        8.6 (13;4.3)
## 2      0.76 (0.2;1.3)
## 3 11000 (11000;11000)
## 4      1.9 (3.2;0.64)
## 5     0.37 (-1.6;2.3)
## 6      1.1 (1.2;0.93)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by mbzinb;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue by mbzinb;

  6. base.mean: fitted mean abundance parameter times the fitted prevalence in baseline group;

  7. mean.LFC: log2-fold change in fitted mean between other group and baseline;

  8. base.abund: fitted mean abundance parameter in baseline group;

  9. abund.LFC: log2-fold change in fitted mean abundance parameter between other group and baseline;

  10. base.prev: fitted prevalence in baseline group;

  11. prev.change: (linear) difference in prevalence between baseline group and other group (other-baseline);

  12. base.disp: fitted dispersion parameter in baseline group;

  13. disp.LFC: log2-fold change in fitted dispersion parameter between other group and baseline;

  14. statistic: value of likelihood ratio test statistic;

  15. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  16. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  17. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  18. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  19. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  20. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

Please open the below buttons, if you want to see other options for differential analysis in mbzinb.

other options for mbzinb
if(0) {
DA_mbzinb <- run_mbzinb(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_mbzinb <- run_mbzinb(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_mbzinb <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))

head(DA_mbzinb)
}

10.3.4 omnibus

This approach is also from mbzinb (J. Chen et al. 2018) package. it uses GMPR (Geometric Mean of Pairwise Ratios) (L. Chen et al. 2018) to get the size factors.

where we specify models for count (abundance), zero (prevalence) and dispersion part. We also provide likelihood ratio test (zinb.lrt) for different models.

(Caution: the otu_table must be integers).

DA_omnibus <- run_omnibus(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "omnibus")
## Start GMPR normalization ...
## Start Winsorization ...
## Perform filtering ...
## --A total of  92  taxa will be tested with a sample size of 23 !
## --Omnibus test is selected!
## --Dispersion is treated as a parameter of interest!
## Start testing ...
## 10 %
## 20 %
## 30 %
## 40 %
## 50 %
## 60 %
## 70 %
## 80 %
## 90 %
## 100%!
## Handle failed taxa using permutation test!
## Permutation test ....
## Completed!
colnames(DA_omnibus)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "Pvalue"                            "AdjustedPvalue"                    "EffectSize"                       
##  [7] "chi.stat"                          "df"                                "abund.baseline"                   
## [10] "prev.baseline"                     "dispersion.baseline"               "abund.LFC.CompvarBB.est"          
## [13] "abund.LFC.CompvarBB.se"            "prev.LOD.CompvarBB.est"            "prev.LOD.CompvarBB.se"            
## [16] "dispersion.LFC.CompvarBB.est"      "dispersion.LFC.CompvarBB.se"       "method"                           
## [19] "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"           "Median Abundance\nAA"             
## [22] "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"            
## [25] "Mean Abundance\nAA"                "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"         
## [28] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_omnibus)
##               TaxaID         Block Enrichment     Pvalue AdjustedPvalue EffectSize  chi.stat df abund.baseline prev.baseline
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif 0.26476211      0.6245670   70.56349 3.9696315  3   0.0050842054     0.2910300
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif 0.13905545      0.4437791   39.65079 5.4930407  3   0.0006912502     0.8894593
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif 0.06293962      0.3795000   23.17460 7.2995223  3   0.0003647798     0.3333734
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 0.12024562      0.4437791 1222.58730 5.8287526  3   0.0553800205     0.5555645
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif 0.86344681      0.9142217  104.78571 0.7413110  3   0.0031212219     0.2357130
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif 0.97967175      0.9796718   23.03175 0.1869261  3   0.0049314397     0.7815505
##   dispersion.baseline abund.LFC.CompvarBB.est abund.LFC.CompvarBB.se prev.LOD.CompvarBB.est prev.LOD.CompvarBB.se
## 1           0.2041100             -0.36978708              1.4040484             -0.4088983             0.9811747
## 2           3.1647486              0.73754098              0.3062639             -0.7727912             1.2480678
## 3          11.0696374              1.29262130              0.3614490              0.6959681             0.8863794
## 4           1.6324698             -1.15572656              0.4756819              0.6968830             0.8947568
## 5           0.5149302             -0.83042868              1.7432216              2.3976744             1.0114415
## 6           1.0012623              0.03215791              0.4770876             -0.3541131             1.0007538
##   dispersion.LFC.CompvarBB.est dispersion.LFC.CompvarBB.se  method Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All)
## 1                    2.8343543                   1.1361534 omnibus                                NA                       0
## 2                   -0.5318657                   0.6766040 omnibus                        -0.9577718                      30
## 3                   -1.7738398                   1.4812754 omnibus                                NA                       0
## 4                   -0.5257607                   0.7011677 omnibus                         0.1524904                     316
## 5                   -2.4675480                   1.0580187 omnibus                                NA                       0
## 6                    0.1662740                   0.6265328 omnibus                        -0.1085245                      64
##   Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA
## 1                    0                  0.0                       1.7051442              58.82609         101.777778
## 2                   26                 50.5                      -1.3107676              50.91304          26.777778
## 3                    0                  9.0                      -2.4683647              19.21739           5.111111
## 4                  329                296.0                       1.7488297             996.26087        1740.444444
## 5                    0                  0.0                      -2.1676332              93.78261          30.000000
## 6                   64                 69.0                       0.2000385             163.86957         177.888889
##   Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1           31.21429                    21.74                 22.22                 21.43    0.67 (-0.11;1.5)
## 2           66.42857                    82.61                 88.89                 78.57       4.1 (6.8;1.3)
## 3           28.28571                    43.48                 33.33                 50.00        8.6 (13;4.3)
## 4          517.85714                    65.22                 55.56                 71.43     0.39 (-1.4;2.2)
## 5          134.78571                    21.74                 22.22                 21.43      1.5 (2.3;0.71)
## 6          154.85714                    73.91                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by glm function to assess the pvalue effect;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue by omnibus;

  6. chi.stat: chisquare statistics;

  7. df: degree freedom;

  8. abund.baseline: mean abundance in baseline group;

  9. prev.baseline: prevalence in baseline group;

  10. dispersion.baseline: dispersion in baseline group;

  11. abund.LFC.CompvarBB.est: log2-fold change in abundance between group BB and other group;

  12. abund.LFC.CompvarBB.se: log2-fold change of standard errors in abundance between group BB and other group;

  13. prev.LOD.CompvarBB.est: prevalence of low of detect value in group BB;

  14. prev.LOD.CompvarBB.se: prevalence’s standard errors of low of detect value in group BB;

  15. dispersion.LFC.CompvarBB.est: log2-fold change in dispersion in group BB;

  16. dispersion.LFC.CompvarBB.se: log2-fold change in dispersion’s standard errors in group BB;

  17. method: methods of test;

  18. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  19. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  20. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  21. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  22. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  23. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

Please open the below buttons, if you want to see other options for differential analysis in omnibus.

other options for omnibus
if(0) {
DA_omnibus <- run_omnibus(
                  data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                  data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "omnibus")


DA_omnibus <- run_omnibus(
                  ps = dada2_ps,
                  taxa_level = "Genus",
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "omnibus")

DA_omnibus <- run_da(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  da_method = "omnibus",
                  method = "omnibus")

head(DA_omnibus)
}

10.3.5 RAIDA

RAIDA package is from A robust approach for identifying differentially abundant features in metagenomic samples (Sohn, Du, and An 2015). It uses Ratio Approach for Identifying Differential Abundance (RAIDA). (Caution: the otu_table must be integers).

DA_RAIDA <- run_raida(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))

colnames(DA_RAIDA)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Pvalue"                            "AdjustedPvalue"                   
##  [7] "eta_AA"                            "mean_AA"                           "sd_AA"                            
## [10] "eta_BB"                            "mean_BB"                           "sd_BB"                            
## [13] "mod.pool.var"                      "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [16] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [19] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [22] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [25] "Odds Ratio (95% CI)"
head(DA_RAIDA)
##               TaxaID         Block Enrichment  EffectSize    Pvalue AdjustedPvalue    eta_AA    mean_AA     sd_AA    eta_BB
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif   70.563492 0.3816518      0.7379802 0.7777778 -1.1787353 3.4581703 0.7857143
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif   39.650794 0.3655566      0.7379802 0.1111111 -3.5858713 1.3594755 0.2142857
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif   23.174603 0.1074479      0.7353741 0.6666667 -4.4427957 0.8570999 0.5000000
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 1222.587302 0.1143915      0.7353741 0.4444444  0.1696923 1.1949839 0.2857143
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif  104.785714 0.6418257      0.8494752 0.7777778 -3.7046260 2.3018074 0.7857143
## 6     g__Allisonella 9_AA vs 14_BB  Nonsignif    3.603175 0.8515585      0.9461761 0.7777778 -3.0106395 0.5628666 0.7142857
##      mean_BB     sd_BB mod.pool.var Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA
## 1 -2.5738907 1.3069089     2.756248                                NA                       0                    0
## 2 -2.9198320 1.8200871     2.410794                        -0.9577718                      30                   26
## 3 -3.0988711 1.0974531     1.280317                                NA                       0                    0
## 4 -0.8127871 0.9057351     1.174178                         0.1524904                     316                  329
## 5 -4.4749687 2.5368159     3.071523                                NA                       0                    0
## 6 -2.8074387 1.3346198     1.491825                                NA                       0                    0
##   Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB
## 1                  0.0                       1.7051442              58.82609         101.777778           31.21429
## 2                 50.5                      -1.3107676              50.91304          26.777778           66.42857
## 3                  9.0                      -2.4683647              19.21739           5.111111           28.28571
## 4                296.0                       1.7488297             996.26087        1740.444444          517.85714
## 5                  0.0                      -2.1676332              93.78261          30.000000          134.78571
## 6                  0.0                      -0.4052144              13.30435          11.111111           14.71429
##   Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                    21.74                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                    82.61                 88.89                 78.57       4.1 (6.8;1.3)
## 3                    43.48                 33.33                 50.00        8.6 (13;4.3)
## 4                    65.22                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                    21.74                 22.22                 21.43      1.5 (2.3;0.71)
## 6                    26.09                 22.22                 28.57      1.1 (1.4;0.89)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by glm function to assess the pvalue effect;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue by RAIDA;

  6. eta_AA: vector containing estimated probabilities of the false zero state for group AA;

  7. mean_AA: vector containing estimated means of log ratios for group AA;

  8. sd_AA: vector containing estimated standard deviations of log ratios for group AA;

  9. eta_BB: vector containing estimated probabilities of the false zero state for group BB;

  10. mean_BB: vector containing estimated means of log ratios for group BB;

  11. sd_BB: vector containing estimated standard deviations of log ratios for group BB;

  12. mod.pool.var: vector containing estimated posterior variances of log ratios;

  13. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  14. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  15. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  16. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  17. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  18. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

Please open the below buttons, if you want to see other options for differential analysis in RAIDA.

other options for RAIDA
if (0) {
DA_RAIDA <- run_raida(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_RAIDA <- run_raida(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_RAIDA <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "raida")

head(DA_RAIDA)
}

10.3.6 Wilcoxon Rank Sum and Signed Rank Tests

Wilcoxon Rank Sum and Signed Rank Tests, which are nonparametric test methods, use the rank of taxa abundance to find the significant taxa.

  • Ordinary pattern
DA_wilcox <- run_wilcox(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))

colnames(DA_wilcox)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Rank)\nAA_vs_BB"  
## [13] "Mean Rank Abundance\nAA"           "Mean Rank Abundance\nBB"           "Occurrence (100%)\n(All)"         
## [16] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_wilcox)
##               TaxaID         Block Enrichment EffectSize Statistic    Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif   70.56349      63.5 1.0000000      1.0000000                                NA
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif   39.65079      44.5 0.2556616      0.7014746                        -0.9577718
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif   23.17460      44.0 0.1980169      0.7014746                                NA
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 1222.58730      71.0 0.6293971      0.8964389                         0.1524904
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif  104.78571      64.5 0.9304707      0.9692403                                NA
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif   23.03175      67.0 0.8239942      0.9363570                        -0.1085245
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Rank)\nAA_vs_BB Mean Rank Abundance\nAA
## 1                       0                    0                  0.0                      0.01201252                   12.06
## 2                      30                   26                 50.5                     -0.42227633                    9.94
## 3                       0                    0                  9.0                     -0.43387758                    9.89
## 4                     316                  329                296.0                      0.17342686                   12.89
## 5                       0                    0                  0.0                      0.03358045                   12.17
## 6                      64                   64                 69.0                      0.08724541                   12.44
##   Mean Rank Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                   11.96                    21.74                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                   13.32                    82.61                 88.89                 78.57       4.1 (6.8;1.3)
## 3                   13.36                    43.48                 33.33                 50.00        8.6 (13;4.3)
## 4                   11.43                    65.22                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                   11.89                    21.74                 22.22                 21.43      1.5 (2.3;0.71)
## 6                   11.71                    73.91                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by glm function to assess the pvalue effect;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue;

  6. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  7. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  8. **Log2FoldChange (Mean Rank)_vs_BB**: Log2FoldChange (Mean Rank Abundance) between group AA and group BB;

  9. Mean Rank Abundance (All)/(group AA)/(group BB): Mean Rank Abundance in all, group AA and group BB, respectively;

  10. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  11. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  12. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  13. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for wilcox
if (0) {
DA_wilcox <- run_wilcox(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_wilcox <- run_wilcox(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_wilcox <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "wilcox")

head(DA_wilcox)
}
  • wilcox_rarefy: random subsampling counts to the smallest library size in the data set.
# summary
summarize_phyloseq(ps = dada2_ps_genus_filter_trim)
## [[1]]
## [1] "1] Min. number of reads = 32359"
## 
## [[2]]
## [1] "2] Max. number of reads = 49328"
## 
## [[3]]
## [1] "3] Total number of reads = 1008016"
## 
## [[4]]
## [1] "4] Average number of reads = 43826.7826086957"
## 
## [[5]]
## [1] "5] Median number of reads = 45596"
## 
## [[6]]
## [1] "7] Sparsity = 0.501304347826087"
## 
## [[7]]
## [1] "6] Any OTU sum to 1 or less? NO"
## 
## [[8]]
## [1] "8] Number of singletons = 0"
## 
## [[9]]
## [1] "9] Percent of OTUs that are singletons\n        (i.e. exactly one read detected across all samples)0"
## 
## [[10]]
## [1] "10] Number of sample variables are: 1"
## 
## [[11]]
## [1] "Group"
# run
DA_wilcox_rarefy <- run_wilcox(
                        ps = dada2_ps_genus_filter_trim,
                        group = "Group",
                        group_names = c("AA", "BB"),
                        norm = "rarefy")
colnames(DA_wilcox_rarefy)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Rank)\nAA_vs_BB"  
## [13] "Mean Rank Abundance\nAA"           "Mean Rank Abundance\nBB"           "Occurrence (100%)\n(All)"         
## [16] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"

The output is the same as the previous results of ordinary pattern.

other options for wilcox_rarefy
if (0) {
DA_wilcox_rarefy <- run_da(
                        ps = dada2_ps_genus_filter_trim,
                        group = "Group",
                        group_names = c("AA", "BB"),
                        da_method = "wilcox",
                        norm = "rarefy")

head(DA_wilcox_rarefy)
}
  • wilcox_CLR: centered log-ratio normalization.
DA_wilcox_CLR <- run_wilcox(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    norm = "CLR")

colnames(DA_wilcox_CLR)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Rank)\nAA_vs_BB"  
## [13] "Mean Rank Abundance\nAA"           "Mean Rank Abundance\nBB"           "Occurrence (100%)\n(All)"         
## [16] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"

The output is the same as the previous results of ordinary pattern.

other options for wilcox_CLR
if(0) {
DA_wilcox_CLR <- run_da(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    da_method = "wilcox",
                    norm = "CLR")

head(DA_wilcox_CLR)
}

10.3.7 Liner discriminant analysis (LDA) effect size (LEfSe)

LEfSe method is from Metagenomic biomarker discovery and explanation (Segata et al. 2011). It uses Liner discriminant analysis model to identify Differential Taxa.

DA_lefse <- run_lefse(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    norm = "CPM",
                    Lda = 2)
colnames(DA_lefse)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "LDA_Score"                         "EffectSize"                        "Log2FoldChange (Median)\nAA_vs_BB"
##  [7] "Median Abundance\n(All)"           "Median Abundance\nAA"              "Median Abundance\nBB"             
## [10] "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"             "Mean Abundance\nAA"               
## [13] "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"            
## [16] "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_lefse)
##                           TaxaID         Block Enrichment LDA_Score EffectSize Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Clostridium_sensu_stricto_1 9_AA vs 14_BB         BB  3.671138   2.679298                                NA
## 2             g__Intestinibacter 9_AA vs 14_BB         BB  3.146337   2.384308                                NA
## 3               g__Lactobacillus 9_AA vs 14_BB         BB  4.256710   2.450656                         -3.336128
## 4                 g__Odoribacter 9_AA vs 14_BB         BB  2.340309   1.876320                                NA
## 5              g__Parasutterella 9_AA vs 14_BB         AA -3.604579   2.234915                          4.402050
## 6                  g__Romboutsia 9_AA vs 14_BB         BB  3.705317   2.929140                         -6.856424
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1               225.11921              0.00000           1173.88222                       -4.746617             5861.9051
## 2               383.50911              0.00000           1398.79740                       -1.964554             1883.3386
## 3              1283.85401            520.46061           5256.08763                       -5.173067            19938.7655
## 4                21.93175            604.99989              0.00000                        2.099583              578.0758
## 5               172.77125           1080.35695             51.09966                        4.481148             3901.0977
## 6              2189.79596             50.35627           5835.02623                       -3.334490             7094.1306
##   Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1           350.3379          9405.0554                    60.87                 22.22                 85.71
## 2           680.6440          2656.4994                    60.87                 22.22                 85.71
## 3           892.0297         32183.0957                    86.96                 77.78                 92.86
## 4          1083.9017           252.9020                    52.17                 77.78                 35.71
## 5          9320.3061           417.3209                    65.22                 88.89                 50.00
## 6          1086.1419         10956.4090                    78.26                 55.56                 92.86
##         Odds Ratio (95% CI)
## 1          4900 (5000;4900)
## 2               3.4 (5.8;1)
## 3 1.1e+08 (1.1e+08;1.1e+08)
## 4             0.31 (-2;2.6)
## 5           0.0025 (-12;12)
## 6                76 (85;68)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by lefse;

  5. LDA_Score: significant level of Pvalue and Adjusted-pvalue;

  6. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  7. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  8. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  9. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  10. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  11. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for lefse
if (0) {
DA_lefse <- run_lefse(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"),
                norm = "CPM",
                Lda = 0)

DA_lefse <- run_lefse(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"),
                norm = "CPM",
                Lda = 0)

DA_lefse <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "lefse",
                norm = "CPM",
                Lda = 0)

head(DA_lefse)
}

10.3.8 t-test

T test, a parametric test method, identifies the significant taxa.

  • Ordinary pattern
DA_ttest <- run_ttest(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))

colnames(DA_ttest)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_ttest)
##               TaxaID         Block Enrichment EffectSize  Statistic     Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif   70.56349  0.6845254 0.51170983      0.7107081                                NA
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif   39.65079 -1.9079866 0.07478702      0.6397026                        -0.9577718
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif   23.17460 -2.1601919 0.04743155      0.6397026                                NA
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 1222.58730  1.4034718 0.19417159      0.6397026                         0.1524904
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif  104.78571 -0.7632086 0.45788140      0.6983360                                NA
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif   23.03175  0.2691871 0.79125424      0.9059862                        -0.1085245
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1                       0                    0                  0.0                       1.7051442              58.82609
## 2                      30                   26                 50.5                      -1.3107676              50.91304
## 3                       0                    0                  9.0                      -2.4683647              19.21739
## 4                     316                  329                296.0                       1.7488297             996.26087
## 5                       0                    0                  0.0                      -2.1676332              93.78261
## 6                      64                   64                 69.0                       0.2000385             163.86957
##   Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1         101.777778           31.21429                    21.74                 22.22                 21.43
## 2          26.777778           66.42857                    82.61                 88.89                 78.57
## 3           5.111111           28.28571                    43.48                 33.33                 50.00
## 4        1740.444444          517.85714                    65.22                 55.56                 71.43
## 5          30.000000          134.78571                    21.74                 22.22                 21.43
## 6         177.888889          154.85714                    73.91                 77.78                 71.43
##   Odds Ratio (95% CI)
## 1    0.67 (-0.11;1.5)
## 2       4.1 (6.8;1.3)
## 3        8.6 (13;4.3)
## 4     0.39 (-1.4;2.2)
## 5      1.5 (2.3;0.71)
## 6     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size of taxa by glm function to assess the pvalue effect;

  5. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue;

  6. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  7. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  8. **Log2FoldChange (GeometricMean)_vs_BB**: Log2FoldChange (GeometricMean Abundance) between group AA and group BB;

  9. GeometricMean Abundance (All)/(group AA)/(group BB): GeometricMean Abundance in all, group AA and group BB, respectively;

  10. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  11. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  12. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  13. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for t-test
if (0) {
DA_ttest <- run_ttest(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_ttest <- run_ttest(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_ttest <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "ttest")

head(DA_ttest)
}
  • t-test_rarefy: random subsampling counts to the smallest library size in the data set.
DA_ttest_rarefy <- run_ttest(
                      ps = dada2_ps_genus_filter_trim,
                      group = "Group",
                      group_names = c("AA", "BB"),
                      norm = "rarefy")
colnames(DA_ttest_rarefy)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"

The output is the same as the previous results of ordinary pattern.

other options for ttest_rarefy
if (0) {
DA_ttest_rarefy <- run_da(
                      ps = dada2_ps_genus_filter_trim,
                      group = "Group",
                      group_names = c("AA", "BB"),
                      da_method = "ttest",
                      norm = "rarefy")

head(DA_ttest_rarefy)
}

10.3.9 MetagenomeSeq

MetagenomeSeq package is from Differential abundance analysis for microbial marker-gene surveys (Paulson et al. 2013). It uses zero-inflated Log-Normal mixture model or Zero-inflated Gaussian mixture model to identify the significant taxa between groups. (Caution: the otu_table should be integers).

DA_metagenomeseq <- run_metagenomeseq(
                        ps = dada2_ps_genus_filter_trim,
                        group = "Group",
                        group_names = c("AA", "BB"),
                        norm = "CSS",
                        method = "ZILN")

colnames(DA_metagenomeseq)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "logFC"                             "Pvalue"                            "AdjustedPvalue"                   
##  [7] "se"                                "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_metagenomeseq)
##               TaxaID         Block Enrichment      logFC Pvalue AdjustedPvalue se Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif -1.1453226     NA             NA NA                                NA
## 2     g__Actinomyces 9_AA vs 14_BB       <NA>  0.6216876     NA             NA NA                        -0.9965674
## 3   g__Adlercreutzia 9_AA vs 14_BB       <NA>  0.4652896     NA             NA NA                                NA
## 4    g__Agathobacter 9_AA vs 14_BB       <NA> -0.2406551     NA             NA NA                        -0.1149914
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif -0.8318088     NA             NA NA                                NA
## 6       g__Alistipes 9_AA vs 14_BB       <NA>  0.2229196     NA             NA NA                         1.2561674
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1                 0.00000              0.00000             0.000000                       3.2876853             148.45156
## 2                40.00000             27.28823            54.446764                      -0.5016143              82.10636
## 3                 0.00000              0.00000             7.846556                      -2.7675469              26.03497
## 4               301.55820            301.55820           326.578060                       1.1416934             867.24692
## 5                 0.00000              0.00000             0.000000                      -2.6101945              72.25398
## 6                83.50731            175.82418            73.609609                       0.2975842             189.53970
##   Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1         327.248231           33.51085                    21.74                 22.22                 21.43
## 2          65.522916           92.76714                    82.61                 88.89                 78.57
## 3           5.739364           39.08214                    43.48                 33.33                 50.00
## 4        1299.865944          589.13469                    65.22                 55.56                 71.43
## 5          17.588969          107.39578                    21.74                 22.22                 21.43
## 6         213.795329          173.94680                    73.91                 77.78                 71.43
##   Odds Ratio (95% CI)
## 1    0.49 (-0.92;1.9)
## 2      1.4 (1.9;0.76)
## 3        8.7 (13;4.5)
## 4    0.54 (-0.68;1.8)
## 5      1.6 (2.6;0.68)
## 6     0.83 (0.47;1.2)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. logFC: fitted coefficient represents the fold-change for group AA and group BB;

  5. se: standard error;

  6. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue;

  7. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  8. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  9. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  10. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  11. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  12. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for metagenomeseq
if (0) {
DA_metagenomeseq <- run_metagenomeseq(
                        data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                        data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                        group = "Group",
                        group_names = c("AA", "BB"),
                        norm = "CSS",
                        method = "ZILN")

DA_metagenomeseq <- run_metagenomeseq(
                        ps = dada2_ps,
                        taxa_level = "Genus",
                        group = "Group",
                        group_names = c("AA", "BB"),
                        norm = "CSS",
                        method = "ZILN")

DA_metagenomeseq<- run_da(
                      ps = dada2_ps_genus_filter_trim,
                      group = "Group",
                      group_names = c("AA", "BB"),
                      da_method = "metagenomeseq",
                      norm = "CSS",
                      method = "ZILN")

head(DA_metagenomeseq)
}

10.3.10 DESeq2

DESeq2 package is from Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 (Love, Huber, and Anders 2014). Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution.(Caution: the otu_table must be integers).

DA_deseq2 <- run_deseq2(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))

colnames(DA_deseq2)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "Pvalue"                            "AdjustedPvalue"                    "logFC"                            
##  [7] "Statistic"                         "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_deseq2)
##               TaxaID         Block Enrichment      Pvalue AdjustedPvalue      logFC  Statistic
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif 0.003314859     0.02209906 -3.1129168 -2.9369235
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif 0.122064731     0.31069563 -1.3609640 -1.5461650
## 3   g__Adlercreutzia 9_AA vs 14_BB         BB 0.008736333     0.04408614 -2.4112622 -2.6222032
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 0.301287313     0.51946088  1.1696067  1.0336767
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif 0.395896745     0.60907192  0.5625950  0.8489722
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif 0.535255838     0.68622543  0.5817917  0.6200030
##   Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1                                NA                       0                    0                  0.0
## 2                        -0.9577718                      30                   26                 50.5
## 3                                NA                       0                    0                  9.0
## 4                         0.1524904                     316                  329                296.0
## 5                                NA                       0                    0                  0.0
## 6                        -0.1085245                      64                   64                 69.0
##   Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All)
## 1                       1.7051442              58.82609         101.777778           31.21429                    21.74
## 2                      -1.3107676              50.91304          26.777778           66.42857                    82.61
## 3                      -2.4683647              19.21739           5.111111           28.28571                    43.48
## 4                       1.7488297             996.26087        1740.444444          517.85714                    65.22
## 5                      -2.1676332              93.78261          30.000000          134.78571                    21.74
## 6                       0.2000385             163.86957         177.888889          154.85714                    73.91
##   Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                 88.89                 78.57       4.1 (6.8;1.3)
## 3                 33.33                 50.00        8.6 (13;4.3)
## 4                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                 22.22                 21.43      1.5 (2.3;0.71)
## 6                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. logFC: fitted coefficient represents the fold-change for group AA and group BB;

  5. Statistic: test statistic (negative binomial model);

  6. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue;

  7. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  8. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  9. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  10. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  11. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  12. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for deseq2
if (0) {
DA_deseq2 <- run_deseq2(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_deseq2 <- run_deseq2(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_deseq2 <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "deseq2")

head(DA_deseq2)
}

10.3.11 EdgeR

EdgeR package is from A scaling normalization method for differential expression analysis of RNA-seq data (Robinson and Oshlack 2010). Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution. (Caution: the otu_table must be integers).

DA_edger <- run_edger(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))
EdgeR (BVC distance)

Figure 10.1: EdgeR (BVC distance)

colnames(DA_edger)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "Pvalue"                            "AdjustedPvalue"                    "logFC"                            
##  [7] "logCPM"                            "LR"                                "Log2FoldChange (Median)\nAA_vs_BB"
## [10] "Median Abundance\n(All)"           "Median Abundance\nAA"              "Median Abundance\nBB"             
## [13] "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"             "Mean Abundance\nAA"               
## [16] "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"            
## [19] "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_edger)
##               TaxaID         Block Enrichment     Pvalue AdjustedPvalue      logFC   logCPM         LR
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif 0.01399827      0.0874892 -3.2265874 11.40297 6.03836132
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif 0.06553105      0.2113905  1.7971377 10.73043 3.39155685
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif 0.02277119      0.1265066  2.1869009  8.75611 5.18587640
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif 0.21979989      0.4070368 -1.3582478 13.73655 1.50567724
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif 0.09219654      0.2622316  2.3965438 10.73275 2.83559687
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif 0.81597305      0.9320193 -0.2248336 11.56947 0.05416207
##   Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1                                NA                       0                    0                  0.0
## 2                        -0.9577718                      30                   26                 50.5
## 3                                NA                       0                    0                  9.0
## 4                         0.1524904                     316                  329                296.0
## 5                                NA                       0                    0                  0.0
## 6                        -0.1085245                      64                   64                 69.0
##   Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All)
## 1                       1.7051442              58.82609         101.777778           31.21429                    21.74
## 2                      -1.3107676              50.91304          26.777778           66.42857                    82.61
## 3                      -2.4683647              19.21739           5.111111           28.28571                    43.48
## 4                       1.7488297             996.26087        1740.444444          517.85714                    65.22
## 5                      -2.1676332              93.78261          30.000000          134.78571                    21.74
## 6                       0.2000385             163.86957         177.888889          154.85714                    73.91
##   Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                 88.89                 78.57       4.1 (6.8;1.3)
## 3                 33.33                 50.00        8.6 (13;4.3)
## 4                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                 22.22                 21.43      1.5 (2.3;0.71)
## 6                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. logFC: fitted coefficient represents the fold-change for group AA and group BB;

  5. logCPM: is the average expression of all samples for that particular gene across all samples on the log-scale expressed in counts per million (cpm, as calculated by edgeR after normalization);

  6. LR: the signed likelihood ratio test statistic;

  7. Pvalue and AdjustedPvalue: significant level of Pvalue and Adjusted-pvalue;

  8. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  9. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  10. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  11. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  12. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  13. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for EdgeR
if (0) {
DA_edger <- run_edger(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_edger <- run_edger(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_edger <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "edger")

head(DA_edger)
}

10.3.12 ANCOM

ANCOM (Analysis of composition of microbiomes) is from Analysis of composition of microbiomes: a novel method for studying microbial composition”, Microbial Ecology in Health (Mandal et al. 2015). ANCOM makes no distributional assumptions and can be implemented in a linear model framework. (Caution: the otu_table must be integers).

DA_ancom <- run_ancom(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"))
ANCOM (Structure Zero)

Figure 10.2: ANCOM (Structure Zero)

colnames(DA_ancom)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "(W)q-values < alpha"               "W_ratio"                          
##  [7] "detected_0.7"                      "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_ancom)
##               TaxaID         Block Enrichment  EffectSize (W)q-values < alpha W_ratio detected_0.7
## 1 g__Acidaminococcus 9_AA vs 14_BB  Nonsignif  0.02320439                   0       0        FALSE
## 2     g__Actinomyces 9_AA vs 14_BB  Nonsignif  0.73520463                   0       0        FALSE
## 3   g__Adlercreutzia 9_AA vs 14_BB  Nonsignif  0.30465472                   0       0        FALSE
## 4    g__Agathobacter 9_AA vs 14_BB  Nonsignif -0.48457752                   0       0        FALSE
## 5     g__Akkermansia 9_AA vs 14_BB  Nonsignif -0.07760459                   0       0        FALSE
## 6       g__Alistipes 9_AA vs 14_BB  Nonsignif -0.01131933                   0       0        FALSE
##   Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1                                NA                       0                    0                  0.0
## 2                        -0.9577718                      30                   26                 50.5
## 3                                NA                       0                    0                  9.0
## 4                         0.1524904                     316                  329                296.0
## 5                                NA                       0                    0                  0.0
## 6                        -0.1085245                      64                   64                 69.0
##   Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All)
## 1                       1.7051442              58.82609         101.777778           31.21429                    21.74
## 2                      -1.3107676              50.91304          26.777778           66.42857                    82.61
## 3                      -2.4683647              19.21739           5.111111           28.28571                    43.48
## 4                       1.7488297             996.26087        1740.444444          517.85714                    65.22
## 5                      -2.1676332              93.78261          30.000000          134.78571                    21.74
## 6                       0.2000385             163.86957         177.888889          154.85714                    73.91
##   Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                 22.22                 21.43    0.67 (-0.11;1.5)
## 2                 88.89                 78.57       4.1 (6.8;1.3)
## 3                 33.33                 50.00        8.6 (13;4.3)
## 4                 55.56                 71.43     0.39 (-1.4;2.2)
## 5                 22.22                 21.43      1.5 (2.3;0.71)
## 6                 77.78                 71.43     0.88 (0.64;1.1)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. Enrichment: enriched direction based on Median Abundance and AdjustedPvalue;

  4. EffectSize: effect size by ANCOM;

  5. (W)q-values < alpha: q-values less than alpha;

  6. W_ratio: the ratio of W values;

  7. detected_0.7: W_ratio more than 0.7;

  8. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  9. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  10. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  11. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  12. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  13. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

other options for ANCOM
if (0) {
DA_ancom <- run_ancom(
                data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                group = "Group",
                group_names = c("AA", "BB"))

DA_ancom <- run_ancom(
                ps = dada2_ps,
                taxa_level = "Genus",
                group = "Group",
                group_names = c("AA", "BB"))

DA_ancom <- run_da(
                ps = dada2_ps_genus_filter_trim,
                group = "Group",
                group_names = c("AA", "BB"),
                da_method = "ancom")

head(DA_ancom)
}

10.3.13 Corncob

Corncob package is from Modeling microbial abundances and dysbiosis with beta-binomial regression”, Microbial Ecology in Health (Martin, Witten, and Willis 2020). Corncob is based on beta-binomial regression. (Caution: the otu_table must be integers).

if (0) {
DA_corncob <- run_corncob(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "Wald")

colnames(DA_ancom)
head(DA_ancom)
}
other options for Corncob
if (0) {
DA_corncob <- run_corncob(
                  data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                  data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "Wald")

DA_corncob <- run_corncob(
                  ps = dada2_ps,
                  taxa_level = "Genus",
                  group = "Group",
                  group_names = c("AA", "BB"),
                  method = "Wald")

DA_corncob <- run_da(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  da_method = "corncob",
                  method = "Wald")

head(DA_corncob)
}

10.3.14 Maaslin2 (Microbiome Multivariable Association with Linear Models)

Maaslin2 package is from Multivariable association discovery in population-scale meta-omics studies (Mallick et al. 2021). Maaslin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, along with a variety of filtering, normalization, and transform methods.

if (0) {
DA_maaslin2 <- run_maaslin2(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  transform = "LOG",
                  norm = "TMM",
                  method = "LM",
                  outdir = "./demo_output")

DA_maaslin2 <- run_maaslin2(
                  ps = dada2_ps,
                  taxa_level = "Genus",
                  group = "Group",
                  group_names = c("AA", "BB"),
                  transform = "LOG",
                  norm = "TMM",
                  method = "LM",
                  outdir = "./demo_output")

DA_maaslin2 <- run_maaslin2(
                  data_otu = phyloseq::otu_table(dada2_ps_genus_filter_trim),
                  data_sam = phyloseq::sample_data(dada2_ps_genus_filter_trim),
                  group = "Group",
                  group_names = c("AA", "BB"),
                  transform = "LOG",
                  norm = "TMM",
                  method = "LM",
                  outdir = "./demo_output")

DA_maaslin2 <- run_da(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  da_method = "maaslin2",
                  transform = "LOG",
                  norm = "TMM",
                  method = "LM",
                  outdir = "./demo_output")

head(DA_maaslin2)
}

10.3.15 LOCOM

LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control (Hu, Satten, and Hu 2022).

Cautions: It reports an error information which the matrix is singularError when user uses low prev.cut.

library(permute)
DA_locom <- run_locom(ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    fdr.nominal = 0.2,
                    prev.cut = 0.8)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 88 OTU(s) with fewer than 18.4 in all samples are removed 
## permutations: 1
DA_locom
##                 TaxaID         Block EffectSize Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All)
## 1       g__Bacteroides 9_AA vs 14_BB      1.030  0.024          0.114                         1.5367266                    3807
## 2           g__Blautia 9_AA vs 14_BB      0.570  0.038          0.114                         0.2641697                    8199
## 3 g__Lachnoclostridium 9_AA vs 14_BB      0.672  0.032          0.114                         0.7874146                     422
## 4     g__Lactobacillus 9_AA vs 14_BB     -3.180  0.022          0.114                        -3.0334230                      56
##   Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA
## 1                 9103               3137.5                       0.9841180             5745.5652         8219.44444
## 2                 8645               7198.5                       0.3178904             9145.9565        10397.55556
## 3                  485                281.0                       0.4658394              417.9565          502.33333
## 4                   24                196.5                      -5.0905713              791.2609           37.44444
##   Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB       Odds Ratio (95% CI)
## 1          4155.2143                   100.00                100.00                100.00             0.45 (-1.1;2)
## 2          8341.3571                   100.00                100.00                100.00           0.62 (-0.3;1.5)
## 3           363.7143                    95.65                100.00                 92.86          0.63 (-0.27;1.5)
## 4          1275.8571                    86.96                 77.78                 92.86 1.8e+08 (1.8e+08;1.8e+08)

Results:

The results comprises more than 10 columns, and the details as follow:

  1. TaxaID: taxa name;

  2. Block: groups’ names and numbers;

  3. EffectSize: effect size by ANCOM;

  4. Pvalue: p-values for OTU-specific tests;

  5. AdjustedPvalue : q-values (adjusted p-values by the BH procedure) for OTU-specific tests;

  6. **Log2FoldChange (Median)_vs_BB**: Log2FoldChange (Median Abundance) between group AA and group BB;

  7. Median Abundance (All)/(group AA)/(group BB): Median Abundance in all, group AA and group BB, respectively;

  8. **Log2FoldChange (Mean)_vs_BB**: Log2FoldChange (Mean Abundance) between group AA and group BB;

  9. Mean Abundance (All)/(group AA)/(group BB): Mean Abundance in all, group AA and group BB, respectively;

  10. Occurrence (All)/(group AA)/(group BB): Occurrence in all, group AA and group BB, respectively;

  11. Odds Ratio (95% CI): 95% confidence interval odds ratio between group AA and group BB.

10.4 Metagenomic sequencing microbial data (metaphlan2/3)

10.4.1 Wilcoxon Rank Sum and Signed Rank Tests

Wilcoxon Rank Sum and Signed Rank Tests, which are nonparameter test methods, use the rank of taxa abundance to find the significant taxa.

DA_wilcox_mgs <- run_wilcox(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"))
colnames(DA_wilcox_mgs)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Rank)\nAA_vs_BB"  
## [13] "Mean Rank Abundance\nAA"           "Mean Rank Abundance\nBB"           "Occurrence (100%)\n(All)"         
## [16] "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_wilcox_mgs)
##               TaxaID         Block Enrichment EffectSize Statistic     Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 7_AA vs 15_BB  Nonsignif  0.9253506      36.0 0.19074499      0.4830467                                NA
## 2     g__Actinomyces 7_AA vs 15_BB  Nonsignif  4.3383273      25.0 0.04564844      0.1896166                                NA
## 3   g__Adlercreutzia 7_AA vs 15_BB  Nonsignif  0.4045684      28.0 0.05769072      0.2076866                                NA
## 4       g__Alistipes 7_AA vs 15_BB  Nonsignif  1.3860231      71.0 0.20193465      0.4830467                          2.061604
## 5    g__Anaerostipes 7_AA vs 15_BB  Nonsignif  0.5796404      31.0 0.13834560      0.3931928                         -4.964322
## 6   g__Anaerotruncus 7_AA vs 15_BB  Nonsignif  0.8015454      47.5 0.71288708      0.8019980                                NA
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Rank)\nAA_vs_BB Mean Rank Abundance\nAA
## 1              0.00000000            0.0000000            0.0000000                      -0.4631577                    9.14
## 2              0.00000280            0.0000000            0.0000061                      -0.8163116                    7.57
## 3              0.00000000            0.0000000            0.0000558                      -0.7147950                    8.00
## 4              0.00711780            0.0236665            0.0056693                       0.4613459                   14.14
## 5              0.00031435            0.0000229            0.0007149                      -0.6171177                    8.43
## 6              0.00000000            0.0000000            0.0000000                      -0.1327552                   10.79
##   Mean Rank Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1                   12.60                    36.36                 14.29                 46.67    8700 (8700;8700)
## 2                   13.33                    54.55                 14.29                 73.33      1.7 (2.7;0.66)
## 3                   13.13                    40.91                 14.29                 53.33       Inf (Inf;Inf)
## 4                   10.27                    77.27                 71.43                 80.00     0.22 (-2.7;3.2)
## 5                   12.93                    86.36                 71.43                 93.33         6 (9.5;2.5)
## 6                   11.83                    36.36                 28.57                 40.00      1.3 (1.8;0.78)
wilcox Metagenomic sequencing in run_da
if (0) {
DA_wilcox_mgs <- run_da(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    da_method = "wilcox")

}

10.4.2 Liner discriminant analysis (LDA) effect size (LEfSe)

LEfSe method is from Metagenomic biomarker discovery and explanation (Segata et al. 2011). It uses Liner discriminant analysis model to identify Differential Taxa.

DA_lefse_mgs <- run_lefse(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    norm = "CPM",
                    Lda = 2)  

colnames(DA_lefse_mgs)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "LDA_Score"                         "EffectSize"                        "Log2FoldChange (Median)\nAA_vs_BB"
##  [7] "Median Abundance\n(All)"           "Median Abundance\nAA"              "Median Abundance\nBB"             
## [10] "Log2FoldChange (Mean)\nAA_vs_BB"   "Mean Abundance\n(All)"             "Mean Abundance\nAA"               
## [13] "Mean Abundance\nBB"                "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"            
## [16] "Occurrence (100%)\nBB"             "Odds Ratio (95% CI)"
head(DA_lefse_mgs)
##               TaxaID         Block Enrichment LDA_Score EffectSize Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All)
## 1     g__Actinomyces 7_AA vs 15_BB         BB  2.233078   1.226382                                NA            2.866622e+00
## 2     g__Bacteroides 7_AA vs 15_BB         AA -5.081345   2.546304                          1.190750            2.541675e+05
## 3 g__Bifidobacterium 7_AA vs 15_BB         BB  4.938251   2.809341                         -6.573979            3.906942e+04
## 4         g__Blautia 7_AA vs 15_BB         BB  4.137194   0.559013                         -3.203536            2.181765e+04
## 5     g__Collinsella 7_AA vs 15_BB         BB  3.837502   2.585562                                NA            7.735890e+03
## 6           g__Dorea 7_AA vs 15_BB         BB  3.830972   2.149879                         -5.725323            8.554293e+03
##   Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All) Mean Abundance\nAA
## 1               0.0000         7.245027e+00                       -1.246758              29.78669           15.38419
## 2          464614.8499         2.035362e+05                        1.141368          291673.92734       464989.41139
## 3             917.8126         8.744173e+04                       -3.459698          114201.49672        14604.58489
## 4            3704.8960         3.413002e+04                       -2.334726           25756.09859         6854.31058
## 5               0.0000         1.164172e+04                       -2.928116           12495.99111         2268.85954
## 6             270.5150         1.431151e+04                       -3.454344           11339.49845         1455.31524
##   Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1           36.50785                    54.55                 14.29                 73.33      1.7 (2.8;0.66)
## 2       210793.36812                   100.00                100.00                100.00     0.32 (-1.9;2.5)
## 3       160680.05558                   100.00                100.00                100.00       150 (160;140)
## 4        34576.93300                   100.00                100.00                100.00         91 (100;82)
## 5        17268.65252                    68.18                 42.86                 80.00          20 (25;14)
## 6        15952.11728                    90.91                 71.43                100.00          28 (34;21)
lefse Metagenomic sequencing in run_da
if(0) {
DA_lefse_mgs <- run_da(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    da_method = "lefse",
                    norm = "CPM",
                    Lda = 0)
}

10.4.3 t-test

T test, a parametric test method, identifies the significant taxa.

DA_ttest_mgs <- run_ttest(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"))
colnames(DA_ttest_mgs)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "Statistic"                         "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
head(DA_ttest_mgs)
##               TaxaID         Block Enrichment EffectSize  Statistic     Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 7_AA vs 15_BB  Nonsignif  0.9253506 -1.0515590 0.31074638      0.5630606                                NA
## 2     g__Actinomyces 7_AA vs 15_BB  Nonsignif  4.3383273 -1.0131776 0.32823254      0.5717599                                NA
## 3   g__Adlercreutzia 7_AA vs 15_BB  Nonsignif  0.4045684 -2.1929042 0.04570343      0.2243623                                NA
## 4       g__Alistipes 7_AA vs 15_BB  Nonsignif  1.3860231  1.5963681 0.15951558      0.5060914                          2.061604
## 5    g__Anaerostipes 7_AA vs 15_BB  Nonsignif  0.5796404 -1.6970563 0.10937430      0.4218723                         -4.964322
## 6   g__Anaerotruncus 7_AA vs 15_BB  Nonsignif  0.8015454 -0.5871152 0.56369794      0.6617324                                NA
##   Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1              0.00000000            0.0000000            0.0000000                      -4.6280567          9.548995e-03
## 2              0.00000280            0.0000000            0.0000061                      -1.2053189          2.846364e-05
## 3              0.00000000            0.0000000            0.0000558                      -9.2633214          8.266409e-04
## 4              0.00711780            0.0236665            0.0056693                       2.1761109          1.995760e-02
## 5              0.00031435            0.0000229            0.0007149                      -2.2827694          1.665177e-03
## 6              0.00000000            0.0000000            0.0000000                      -0.9025366          2.137000e-04
##   Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1       5.558857e-04       0.0137457800                    36.36                 14.29                 46.67
## 2       1.505714e-05       0.0000347200                    54.55                 14.29                 73.33
## 3       1.971429e-06       0.0012114867                    40.91                 14.29                 53.33
## 4       4.254910e-02       0.0094149000                    77.27                 71.43                 80.00
## 5       4.579714e-04       0.0022285400                    86.36                 71.43                 93.33
## 6       1.341714e-04       0.0002508133                    36.36                 28.57                 40.00
##   Odds Ratio (95% CI)
## 1    8700 (8700;8700)
## 2      1.7 (2.7;0.66)
## 3       Inf (Inf;Inf)
## 4     0.22 (-2.7;3.2)
## 5         6 (9.5;2.5)
## 6      1.3 (1.8;0.78)
ttest Metagenomic sequencing in run_da
if(0) {
DA_ttest_mgs <- run_da(
                    ps = metaphlan2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"),
                    da_method = "ttest")
}

10.4.4 Maaslin2 (Microbiome Multivariable Association with Linear Models)

Maaslin2 package is from Multivariable association discovery in population-scale meta-omics studies (Mallick et al. 2021). Maaslin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, along with a variety of filtering, normalization, and transform methods.

if (0) {
DA_maaslin2_mgs <- run_maaslin2(
                      ps = metaphlan2_ps_genus_filter_trim,
                      group = "Group",
                      group_names = c("AA", "BB"),
                      transform = "NONE",
                      norm = "NONE",
                      method = "LM",
                      outdir = "./demo_output")

head(DA_maaslin2_mgs)
}
maaslin2 Metagenomic sequencing in run_da
if (0) {
DA_maaslin2_mgs <- run_da(
                      ps = metaphlan2_ps_genus_filter_trim,
                      group = "Group",
                      group_names = c("AA", "BB"),
                      da_method = "maaslin2",
                      transform = "NONE",
                      norm = "NONE",
                      method = "LM",
                      outdir = "./demo_output")
}

10.5 Visualization

The Volcano plot is used to display differential analysis. plot_volcano provides multiple parameters for plotting volcano. More details to see help(plot_volcano).

The barplot is used to display the lefse results.

10.5.1 Volcano plot

The X and Y coordinate axis are flexible to choose for Volcano. Here, we choose logFC and AdjustedPvalue to visualize the results

DA_limma_voom <- run_limma_voom(
                    ps = dada2_ps_genus_filter_trim,
                    group = "Group",
                    group_names = c("AA", "BB"))

colnames(DA_limma_voom)
##  [1] "TaxaID"                            "Block"                             "Enrichment"                       
##  [4] "EffectSize"                        "logFC"                             "Pvalue"                           
##  [7] "AdjustedPvalue"                    "Log2FoldChange (Median)\nAA_vs_BB" "Median Abundance\n(All)"          
## [10] "Median Abundance\nAA"              "Median Abundance\nBB"              "Log2FoldChange (Mean)\nAA_vs_BB"  
## [13] "Mean Abundance\n(All)"             "Mean Abundance\nAA"                "Mean Abundance\nBB"               
## [16] "Occurrence (100%)\n(All)"          "Occurrence (100%)\nAA"             "Occurrence (100%)\nBB"            
## [19] "Odds Ratio (95% CI)"
DA_limma_voom_volcano <- plot_volcano(
                          da_res = DA_limma_voom,
                          group_names = c("AA", "BB"),
                          x_index = "logFC",
                          x_index_cutoff = 0.5,
                          y_index = "Pvalue",
                          y_index_cutoff = 0.05,
                          group_color = c("red", "grey", "blue"),
                          topN = 5)

DA_limma_voom_volcano
Volcano (limma-voom: logFC and AdjustedPvalue)

Figure 10.3: Volcano (limma-voom: logFC and AdjustedPvalue)

DA_limma_voom_volcano effectsize
plot_volcano(
    da_res = DA_limma_voom,
    group_names = c("AA", "BB"),
    x_index = "EffectSize",
    x_index_cutoff = 1,
    y_index = "Pvalue",
    y_index_cutoff = 0.05,
    group_color = c("red", "grey", "blue"),
    topN = 8)
Volcano (limma-voom: EffectSize and Pvalue)

Figure 10.4: Volcano (limma-voom: EffectSize and Pvalue)

The logFC and EffectSize are the same values in limma-voom.

10.5.2 barplot in lefse

DA_lefse <- run_lefse(
              ps = dada2_ps_genus_filter_trim,
              group = "Group",
              group_names = c("AA", "BB"),
              norm = "CPM",
              Lda = 2)

# # don't run this code when you do lefse in reality
# DA_lefse$LDA_Score <- DA_lefse$LDA_Score * 1000

plot_lefse(
  da_res = DA_lefse,
  x_index = "LDA_Score",
  x_index_cutoff = 1,
  group_color = c("green", "red")) 
Barplot (Lefse)

Figure 10.5: Barplot (Lefse)

10.6 Dominant taxa

Display the significant taxa with selection using boxplot.

DA_wilcox <- run_wilcox(
               ps = metaphlan2_ps_genus_filter_trim,
               group = "Group",
               group_names = c("AA", "BB"))

plot_topN_boxplot(
    ps = metaphlan2_ps_genus_filter_trim,
    da_res = DA_wilcox,
    x_index = "Log2FoldChange (Median)\nAA_vs_BB",
    x_index_cutoff = 0.2,
    y_index = "Pvalue",
    y_index_cutoff = 0.3,
    topN = 3,
    taxa_name = "s__Ruminococcus_torques",
    group = "Group")
Dominant Taxa

Figure 10.6: Dominant Taxa

10.7 Multiple differential analysis by one function

here, we provide the run_multiple_da for obtaining the results list from multiple differential analysis methods.

multiple_res <- run_multiple_da(
                  ps = dada2_ps_genus_filter_trim,
                  group = "Group",
                  group_names = c("AA", "BB"),
                  da_method = c("aldex", "limma_voom", "mbzinb", "omnibus"),
                  p_adjust = "none")
## |------------(25%)----------(50%)----------(75%)----------|
## Start GMPR normalization ...
## Start Winsorization ...
## Perform filtering ...
## --A total of  92  taxa will be tested with a sample size of 23 !
## --Omnibus test is selected!
## --Dispersion is treated as a parameter of interest!
## Start testing ...
## 10 %
## 20 %
## 30 %
## 40 %
## 50 %
## 60 %
## 70 %
## 80 %
## 90 %
## 100%!
## Handle failed taxa using permutation test!
## Permutation test ....
## Completed!
names(multiple_res)
## [1] "aldex"      "limma_voom" "mbzinb"     "omnibus"
  • plot results
plot_multiple_DA(
    Multip_DA_res = multiple_res,
    x_index_list = c("EffectSize", "logFC", "mean.LFC", "abund.LFC.CompvarBB.est"),
    x_index_cutoff = 0,
    y_index = "AdjustedPvalue",
    y_index_cutoff = 0.5,
    cellwidth = 50,
    cellheight = 10)
Multiple DA results

Figure 10.7: Multiple DA results

10.8 Comparing outputs from XMAS2, lefse-conda, and lefse-galaxy using the same in-house datasets (amplicon_ps|Zeybel_Gut)

10.8.1 Introduction

In this document, Comparing the output from lefse through different applications:

  • XMAS2 (R package)
  • lefse-conda (command line)
  • lefse-galaxy (from the galaxy platfrom)

In all cases, using the same dataset, amplicon_ps and Zeybel_Gut, which are included in the XMAS package.

library(XMAS2)
library(dplyr)
library(ggplot2)
library(devtools)
library(tibble)
library(tidyr)
library(magrittr)
library(readr)
library(VennDiagram)
library(purrr)

# rm(list = ls())
options(stringsAsFactors = F)
options(future.globals.maxSize = 1000 * 1024^2)

10.8.2 Dataset

10.8.2.1 16s genus

data("amplicon_ps")
amplicon_ps_genus <- summarize_taxa(amplicon_ps, taxa_level = "Genus")
amplicon_ps_genus
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 406 taxa and 34 samples ]
## sample_data() Sample Data:       [ 34 samples by 8 sample variables ]
## tax_table()   Taxonomy Table:    [ 406 taxa by 6 taxonomic ranks ]

10.8.2.2 metagenomics species

data("Zeybel_Gut")
Zeybel_ps_species <- summarize_taxa(Zeybel_Gut, taxa_level = "Species")
Zeybel_ps_species
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 372 taxa and 42 samples ]
## sample_data() Sample Data:       [ 42 samples by 51 sample variables ]
## tax_table()   Taxonomy Table:    [ 372 taxa by 7 taxonomic ranks ]

10.8.2.3 Preparing for lefse galaxy and conda

  • 1st row: class (required)

  • 2nd row: subclass (optional)

  • 3rd row: sampleID (required)

  • rownames: taxon

  • data format: splitted by “

prepare_lefse <- function(ps,
                          Class,
                          Class_names,
                          Subclass = NULL,
                          cutoff = 10) {
    
    # ps = amplicon_ps_genus
    # Class = "SampleType"
    # Class_names = c("gut", "tongue")
    # Subclass = NULL
    # cutoff = 10
    
    sam_tab <- phyloseq::sample_data(ps) %>%
        data.frame()
    colnames(sam_tab)[which(colnames(sam_tab) == Class)] <- "CompClass"
    
    if (is.null(Subclass)) {
        sam_tab_final <- sam_tab %>%
            dplyr::select(CompClass) %>%
            tibble::rownames_to_column("TempRowNames") %>%
            dplyr::filter(CompClass %in% Class_names) %>%
            dplyr::select(all_of(c("TempRowNames", "CompClass"))) %>%
            tibble::column_to_rownames("TempRowNames")
    } else {
        sam_tab_final <- sam_tab %>%
            dplyr::select(all_of(c("CompClass", Subclass))) %>%
            tibble::rownames_to_column("TempRowNames") %>%
            dplyr::filter(CompClass %in% Class_names) %>%
            dplyr::select(all_of(c("TempRowNames", "CompClass", Subclass))) %>%
            tibble::column_to_rownames("TempRowNames")
    }
    
    colnames(sam_tab_final)[which(colnames(sam_tab_final) == "CompClass")] <- Class
    
    phyloseq::sample_data(ps) <- phyloseq::sample_data(sam_tab_final)
    otu_tab <- phyloseq::otu_table(ps) %>%
        data.frame()
    otu_tab_final <- otu_tab[rowSums(otu_tab) > cutoff, colSums(otu_tab) > cutoff, F]
    phyloseq::otu_table(ps) <- phyloseq::otu_table(as.matrix(otu_tab_final), taxa_are_rows = TRUE)
    
    lefse_data <- sam_tab_final %>% 
        tibble::rownames_to_column("Sample") %>%
        dplyr::inner_join(otu_tab_final %>% 
                              t() %>% data.frame() %>%
                              tibble::rownames_to_column("Sample"),
                          by = "Sample") %>%
        dplyr::select(all_of(Class), Sample, all_of(Subclass), everything()) %>%
        #stats::setNames(c(Class, "Sample", Subclass, rownames(otu_tab_final))) %>%
        t() %>% data.frame()
    
    lefse_data_nosub <- sam_tab_final %>% 
        tibble::rownames_to_column("Sample") %>%
        dplyr::inner_join(otu_tab_final %>% 
                              t() %>% data.frame() %>%
                              tibble::rownames_to_column("Sample"),
                          by = "Sample") %>%
        dplyr::select(-Sample) %>%
        dplyr::select(all_of(Class), all_of(Subclass), everything()) %>%
        t() %>% data.frame()
    
    res <- list(ps=ps,
                lefse=lefse_data,
                lefse_nosub=lefse_data_nosub)
    
    return(res)
}

amplicon_ps_genus_lefse <- prepare_lefse(
                          ps = amplicon_ps_genus,
                          Class = "SampleType",
                          Class_names = c("gut", "tongue"),
                          cutoff = 10)

write.table(amplicon_ps_genus_lefse$lefse, "amplicon_ps_genus_lefse.tsv", quote = F, sep = "\t", col.names = F)
write.table(amplicon_ps_genus_lefse$lefse_nosub, "amplicon_ps_genus_lefse_nosub.tsv", quote = F, sep = "\t", col.names = F)

Zeybel_ps_species_lefse <- prepare_lefse(
                          ps = Zeybel_ps_species,
                          Class = "LiverFatClass",
                          Class_names = c("Mild", "Moderate"),
                          cutoff = 1e-4)

write.table(Zeybel_ps_species_lefse$lefse, "Zeybel_ps_species_lefse.tsv", quote = F, sep = "\t", col.names = F)
write.table(Zeybel_ps_species_lefse$lefse_nosub, "Zeybel_ps_species_lefse_nosub.tsv", quote = F, sep = "\t", col.names = F)

10.8.3 Run lefse independently with the three applications (R, conda, galaxy)

10.8.3.1 Running lefse in R (XMAS2)

Perform the analysis with the run_lefse2 function:

  • amplicon_ps_genus
# run_lefse
amplicon_xmas2_output <- run_lefse(
                          ps = amplicon_ps_genus_lefse$ps,
                          group = "SampleType",
                          group_names = c("gut", "tongue"),
                          norm = "CPM") %>% 
    dplyr::mutate(app_name = "xmas_lefse") %>% 
    dplyr::arrange(LDA_Score)
head(amplicon_xmas2_output)
##                            TaxaID             Block Enrichment LDA_Score EffectSize Log2FoldChange (Median)\ngut_vs_tongue
## 1                  g__Bacteroides 8_gut vs 9_tongue        gut -5.475334   5.734982                               8.706188
## 2 g__Lachnospiraceae_unclassified 8_gut vs 9_tongue        gut -4.445346   2.296951                               3.048798
## 3 g__Ruminococcaceae_unclassified 8_gut vs 9_tongue        gut -4.430034   5.962548                               8.833464
## 4                  g__Lachnospira 8_gut vs 9_tongue        gut -4.327495   6.503718                                     NA
## 5        g__Phascolarctobacterium 8_gut vs 9_tongue        gut -4.214235   6.206589                                     NA
## 6             g__Faecalibacterium 8_gut vs 9_tongue        gut -4.136991   5.492180                                     NA
##   Median Abundance\n(All) Median Abundance\ngut Median Abundance\ntongue Log2FoldChange (Mean)\ngut_vs_tongue
## 1               4552.3520             604210.78                 1446.655                             7.995845
## 2              11431.9387              64739.42                 7823.286                             3.186478
## 3                945.6265              53763.30                  117.855                             7.701566
## 4                  0.0000              50368.43                    0.000                                   NA
## 5                  0.0000              18238.84                    0.000                                   NA
## 6                  0.0000              19194.92                    0.000                                   NA
##   Mean Abundance\n(All) Mean Abundance\ngut Mean Abundance\ntongue Occurrence (100%)\n(All) Occurrence (100%)\ngut
## 1             277146.67           586352.50              2297.0454                   100.00                    100
## 2              34604.22            65446.48              7188.8734                   100.00                    100
## 3              26586.43            56192.47               269.9452                    76.47                    100
## 4              22767.22            48380.35                 0.0000                    47.06                    100
## 5              13444.82            28570.25                 0.0000                    47.06                    100
## 6              13991.79            29732.55                 0.0000                    47.06                    100
##   Occurrence (100%)\ntongue Odds Ratio (95% CI)   app_name
## 1                    100.00    4.7e-14 (-60;60) xmas_lefse
## 2                    100.00  4.4e-24 (-110;110) xmas_lefse
## 3                     55.56  7.2e-41 (-180;180) xmas_lefse
## 4                      0.00                <NA> xmas_lefse
## 5                      0.00                <NA> xmas_lefse
## 6                      0.00                <NA> xmas_lefse
# run_lefse2
amplicon_xmas2_output2 <- run_lefse2(
                          ps = amplicon_ps_genus_lefse$ps,
                          group = "SampleType",
                          group_names = c("gut", "tongue"),
                          norm = "CPM") %>% 
    dplyr::mutate(app_name = "xmas_lefse2") %>% 
    dplyr::arrange(LDA_Score)
head(amplicon_xmas2_output2)
##                            TaxaID             Block LDA_Score Enrichment EffectSize       Pvalue
## 1                  g__Bacteroides 8_gut vs 9_tongue -5.763040        gut   5.734982 0.0005320055
## 2 g__Lachnospiraceae_unclassified 8_gut vs 9_tongue -4.774847        gut   2.296951 0.0005320055
## 3 g__Ruminococcaceae_unclassified 8_gut vs 9_tongue -4.756686        gut   5.962548 0.0004911726
## 4                  g__Lachnospira 8_gut vs 9_tongue -4.684106        gut   6.503718 0.0001762277
## 5             g__Faecalibacterium 8_gut vs 9_tongue -4.447011        gut   5.492180 0.0001762277
## 6        g__Phascolarctobacterium 8_gut vs 9_tongue -4.441230        gut   6.206589 0.0001762277
##   Log2FoldChange (Median)\ngut_vs_tongue Median Abundance\n(All) Median Abundance\ngut Median Abundance\ntongue
## 1                               8.706188               4552.3520             604210.78                 1446.655
## 2                               3.048798              11431.9387              64739.42                 7823.286
## 3                               8.833464                945.6265              53763.30                  117.855
## 4                                     NA                  0.0000              50368.43                    0.000
## 5                                     NA                  0.0000              19194.92                    0.000
## 6                                     NA                  0.0000              18238.84                    0.000
##   Log2FoldChange (Mean)\ngut_vs_tongue Mean Abundance\n(All) Mean Abundance\ngut Mean Abundance\ntongue
## 1                             7.995845             277146.67           586352.50              2297.0454
## 2                             3.186478              34604.22            65446.48              7188.8734
## 3                             7.701566              26586.43            56192.47               269.9452
## 4                                   NA              22767.22            48380.35                 0.0000
## 5                                   NA              13991.79            29732.55                 0.0000
## 6                                   NA              13444.82            28570.25                 0.0000
##   Occurrence (100%)\n(All) Occurrence (100%)\ngut Occurrence (100%)\ntongue Odds Ratio (95% CI)    app_name
## 1                   100.00                    100                    100.00    4.7e-14 (-60;60) xmas_lefse2
## 2                   100.00                    100                    100.00  4.4e-24 (-110;110) xmas_lefse2
## 3                    76.47                    100                     55.56  7.2e-41 (-180;180) xmas_lefse2
## 4                    47.06                    100                      0.00                <NA> xmas_lefse2
## 5                    47.06                    100                      0.00                <NA> xmas_lefse2
## 6                    47.06                    100                      0.00                <NA> xmas_lefse2
  • Zeybel_ps_species
# run_lefse
MGS_xmas2_output <- run_lefse(
                          ps = Zeybel_ps_species_lefse$ps,
                          group = "LiverFatClass",
                          group_names = c("Mild", "Moderate"),
                          norm = "CPM") %>% 
    dplyr::mutate(app_name = "xmas_lefse") %>% 
    dplyr::arrange(LDA_Score)
head(MGS_xmas2_output)
##                            TaxaID                  Block Enrichment LDA_Score EffectSize
## 1         s__Butyricimonas_virosa 12_Mild vs 12_Moderate   Moderate  2.445966   1.668501
## 2       s__Bacteroides_salyersiae 12_Mild vs 12_Moderate   Moderate  2.659689   1.573816
## 3           s__Bacteroides_clarus 12_Mild vs 12_Moderate   Moderate  3.191513   2.331714
## 4 s__Bacteroides_thetaiotaomicron 12_Mild vs 12_Moderate   Moderate  3.210748   2.775490
## 5        s__Bacteroides_coprocola 12_Mild vs 12_Moderate   Moderate  3.314238   1.367262
## 6 s__Bacteroides_cellulosilyticus 12_Mild vs 12_Moderate   Moderate  3.408064   3.145457
##   Log2FoldChange (Median)\nMild_vs_Moderate Median Abundance\n(All) Median Abundance\nMild Median Abundance\nModerate
## 1                                        NA               0.8999998                      0                   262.9238
## 2                                        NA               0.0000000                      0                     0.0000
## 3                                        NA               0.0000000                      0                     0.0000
## 4                                        NA             223.8648101                      0                  1123.0082
## 5                                        NA               0.0000000                      0                     0.0000
## 6                                        NA              82.9748375                      0                  2707.7169
##   Log2FoldChange (Mean)\nMild_vs_Moderate Mean Abundance\n(All) Mean Abundance\nMild Mean Abundance\nModerate
## 1                               -2.329681              353.2093             117.2101                 589.2084
## 2                                      NA              262.0187               0.0000                 524.0375
## 3                                      NA             1283.6658               0.0000                2567.3315
## 4                               -2.241224             2413.3440             842.6510                3984.0371
## 5                                      NA             1704.3673               0.0000                3408.7346
## 6                               -3.325019             3170.2327             575.2844                5765.1810
##   Occurrence (100%)\n(All) Occurrence (100%)\nMild Occurrence (100%)\nModerate Odds Ratio (95% CI)   app_name
## 1                    50.00                   25.00                       75.00       3.9 (6.6;1.2) xmas_lefse
## 2                    16.67                    0.00                       33.33                <NA> xmas_lefse
## 3                    20.83                    0.00                       41.67                <NA> xmas_lefse
## 4                    66.67                   41.67                       91.67      3.1 (5.3;0.87) xmas_lefse
## 5                    16.67                    0.00                       33.33                <NA> xmas_lefse
## 6                    62.50                   41.67                       83.33          17 (23;11) xmas_lefse
# run_lefse2
MGS_xmas2_output2 <- run_lefse2(
                          ps = Zeybel_ps_species_lefse$ps,
                          group = "LiverFatClass",
                          group_names = c("Mild", "Moderate"),
                          norm = "CPM") %>% 
    dplyr::mutate(app_name = "xmas_lefse2") %>% 
    dplyr::arrange(LDA_Score)
head(MGS_xmas2_output2)
##                            TaxaID                  Block LDA_Score Enrichment EffectSize     Pvalue
## 1         s__Butyricimonas_virosa 12_Mild vs 12_Moderate  2.709350   Moderate   1.668501 0.02432292
## 2       s__Bacteroides_salyersiae 12_Mild vs 12_Moderate  2.763901   Moderate   1.573816 0.03286923
## 3           s__Bacteroides_clarus 12_Mild vs 12_Moderate  3.354092   Moderate   2.331714 0.01473169
## 4 s__Bacteroides_thetaiotaomicron 12_Mild vs 12_Moderate  3.502262   Moderate   2.775490 0.01050852
## 5     s__Bacteroides_intestinalis 12_Mild vs 12_Moderate  3.524676   Moderate   1.184353 0.03286923
## 6        s__Bacteroides_coprocola 12_Mild vs 12_Moderate  3.538227   Moderate   1.367262 0.03286923
##   Log2FoldChange (Median)\nMild_vs_Moderate Median Abundance\n(All) Median Abundance\nMild Median Abundance\nModerate
## 1                                        NA               0.8999998                      0                   262.9238
## 2                                        NA               0.0000000                      0                     0.0000
## 3                                        NA               0.0000000                      0                     0.0000
## 4                                        NA             223.8648101                      0                  1123.0082
## 5                                        NA               0.0000000                      0                     0.0000
## 6                                        NA               0.0000000                      0                     0.0000
##   Log2FoldChange (Mean)\nMild_vs_Moderate Mean Abundance\n(All) Mean Abundance\nMild Mean Abundance\nModerate
## 1                               -2.329681              353.2093             117.2101                 589.2084
## 2                                      NA              262.0187               0.0000                 524.0375
## 3                                      NA             1283.6658               0.0000                2567.3315
## 4                               -2.241224             2413.3440             842.6510                3984.0371
## 5                                      NA             1745.9150               0.0000                3491.8300
## 6                                      NA             1704.3673               0.0000                3408.7346
##   Occurrence (100%)\n(All) Occurrence (100%)\nMild Occurrence (100%)\nModerate Odds Ratio (95% CI)    app_name
## 1                    50.00                   25.00                       75.00       3.9 (6.6;1.2) xmas_lefse2
## 2                    16.67                    0.00                       33.33                <NA> xmas_lefse2
## 3                    20.83                    0.00                       41.67                <NA> xmas_lefse2
## 4                    66.67                   41.67                       91.67      3.1 (5.3;0.87) xmas_lefse2
## 5                    16.67                    0.00                       33.33                <NA> xmas_lefse2
## 6                    16.67                    0.00                       33.33                <NA> xmas_lefse2

10.8.3.2 Running lefse-conda (command line)

10.8.3.2.0.1 lefse-conda installation and version

Note: I installed lefse following the instructions from this site after installing conda.

## Add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery

## Install lefse
conda create -n lefse -c biobakery lefse -y

Conda and lefse versions:

conda --version
#> conda 4.12.0

conda list | grep -e "lefse"
# packages in environment at /home/samuel/miniconda3/envs/lefse:
#> lefse                     1.1.2              pyhdfd78af_0    bioconda
10.8.3.2.1 Run lefse-conda
  1. Generate a tabular dataset (amplicon_ps_genus_lefse or Zeybel_ps_species_lefse) compatible with lefse-conda and lefse-galaxy using the get_dataset.R script.

  2. Run the script run_lefse.sh (linux) with the following parameters:

# In general
# ./run_lefse.sh <path/to/conda/activate> <env_name> <path/to/Rscript> <filename prefix>

# in my case (Hua Zou)

## amplicon_ps_genus_lefse 
./run_lefse.sh /Users/zouhua/opt/anaconda3/bin/activate lefse /usr/local/bin/R amplicon_ps_genus_lefse

## Zeybel_ps_species_lefse 
./run_lefse.sh /Users/zouhua/opt/anaconda3/bin/activate lefse /usr/local/bin/R Zeybel_ps_species_lefse

Note: All script files, get_dataset.R and run_lefse.sh, and this rmarkdown document must be in the same directory.

10.8.3.2.2 Import output from lefse-conda into R
get_lefse_python <- function(datres, 
                            Class_names,
                            name = "lefse_conda",
                            LDA_names = "lefse_conda_LDA",
                            LDA_cutoff = 2) {
    
    # datres = "amplicon_ps_genus_lefse.res"
    # Class_names = c("gut", "tongue")
    # LDA_cutoff = 2
    
    col_names <- c(
        "TaxaID", "log_hi_class_avg", "Enrichment", "lefse_conda_LDA", "pval")
    lefse_conda <- readr::read_tsv(datres, show_col_types = FALSE, col_names = FALSE ) %>% 
        magrittr::set_colnames(col_names) %>% 
        dplyr::filter(!is.na(lefse_conda_LDA)) %>%
        dplyr::mutate(
            lefse_conda_LDA = ifelse(
                Enrichment == Class_names[1], -lefse_conda_LDA, lefse_conda_LDA),
            app_name = name) %>% 
        dplyr::filter(abs(lefse_conda_LDA) >= LDA_cutoff) %>%
        dplyr::arrange(lefse_conda_LDA)
    
    colnames(lefse_conda)[which(colnames(lefse_conda) == "lefse_conda_LDA")] <- LDA_names
    
    return(lefse_conda)
}

amplicon_ps_genus_lefse_conda <- get_lefse_python(
                    datres = "amplicon_ps_genus_lefse.res", 
                    Class_names = c("gut", "tongue"),
                    LDA_names = "lefse_conda_LDA")
head(amplicon_ps_genus_lefse_conda)
## # A tibble: 6 × 6
##   TaxaID                          log_hi_class_avg Enrichment lefse_conda_LDA pval                  app_name   
##   <chr>                                      <dbl> <chr>                <dbl> <chr>                 <chr>      
## 1 g__Bacteroides                              5.77 gut                  -5.47 0.0005320055051392497 lefse_conda
## 2 g__Lachnospiraceae_unclassified             4.82 gut                  -4.47 0.0005320055051392497 lefse_conda
## 3 g__Ruminococcaceae_unclassified             4.75 gut                  -4.46 0.0004911726274481772 lefse_conda
## 4 g__Lachnospira                              4.68 gut                  -4.36 0.0001762276980982556 lefse_conda
## 5 g__Faecalibacterium                         4.47 gut                  -4.20 0.0001762276980982556 lefse_conda
## 6 g__Phascolarctobacterium                    4.46 gut                  -4.15 0.0001762276980982556 lefse_conda
Zeybel_ps_species_lefse_conda <- get_lefse_python(
                    datres = "Zeybel_ps_species_lefse.res", 
                    Class_names = c("Mild", "Moderate"),
                    LDA_names = "lefse_conda_LDA")
head(Zeybel_ps_species_lefse_conda)
## # A tibble: 6 × 6
##   TaxaID                          log_hi_class_avg Enrichment lefse_conda_LDA pval                 app_name   
##   <chr>                                      <dbl> <chr>                <dbl> <chr>                <chr>      
## 1 s__Butyricimonas_virosa                     2.77 Moderate              2.46 0.02432291566171718  lefse_conda
## 2 s__Bacteroides_salyersiae                   2.72 Moderate              2.47 0.032869234300658745 lefse_conda
## 3 s__Bacteroides_clarus                       3.41 Moderate              3.07 0.014731687897477879 lefse_conda
## 4 s__Bacteroides_thetaiotaomicron             3.60 Moderate              3.26 0.01050851971947486  lefse_conda
## 5 s__Bacteroides_coprocola                    3.53 Moderate              3.28 0.032869234300658745 lefse_conda
## 6 s__Bacteroides_intestinalis                 3.54 Moderate              3.32 0.032869234300658745 lefse_conda

10.8.3.3 Running lefse from galaxy

Using the amplicon_ps_genus_lefse_nosub.txt or Zeybel_ps_species_lefse_nosub.txt file (no subjects included) as input for lefse from the galaxy platform of the Huttenhower lab at galaxy.

The conditions as follow:

  • alpha were 0.05 for both KW and Wilcox,

  • 2.0 for LDA.

  • TSS normalization was applied as well.

converting the output into compared format:

  • amplicon_ps_genus_lefse_nosub.res

  • Zeybel_ps_species_lefse_nosub.res

amplicon_ps_genus_lefse_galaxy <- get_lefse_python(
                        datres = "amplicon_ps_genus_lefse_nosub.res",
                        name = "lefse_galaxy",
                        Class_names = c("gut", "tongue"),
                        LDA_names = "lefse_galaxy_LDA")
head(amplicon_ps_genus_lefse_galaxy)
## # A tibble: 6 × 6
##   TaxaID                          log_hi_class_avg Enrichment lefse_galaxy_LDA pval              app_name    
##   <chr>                                      <dbl> <chr>                 <dbl> <chr>             <chr>       
## 1 g__Bacteroides                              5.77 gut                   -5.46 0.000532005505139 lefse_galaxy
## 2 g__Lachnospiraceae_unclassified             4.82 gut                   -4.45 0.000532005505139 lefse_galaxy
## 3 g__Ruminococcaceae_unclassified             4.75 gut                   -4.41 0.000491172627448 lefse_galaxy
## 4 g__Lachnospira                              4.68 gut                   -4.38 0.000176227698098 lefse_galaxy
## 5 g__Phascolarctobacterium                    4.46 gut                   -4.18 0.000176227698098 lefse_galaxy
## 6 g__Faecalibacterium                         4.47 gut                   -4.18 0.000176227698098 lefse_galaxy
Zeybel_ps_species_lefse_galaxy <- get_lefse_python(
                        datres = "Zeybel_ps_species_lefse_nosub.res",
                        name = "lefse_galaxy",
                        Class_names = c("Mild", "Moderate"),
                        LDA_names = "lefse_galaxy_LDA")
head(amplicon_ps_genus_lefse_galaxy)
## # A tibble: 6 × 6
##   TaxaID                          log_hi_class_avg Enrichment lefse_galaxy_LDA pval              app_name    
##   <chr>                                      <dbl> <chr>                 <dbl> <chr>             <chr>       
## 1 g__Bacteroides                              5.77 gut                   -5.46 0.000532005505139 lefse_galaxy
## 2 g__Lachnospiraceae_unclassified             4.82 gut                   -4.45 0.000532005505139 lefse_galaxy
## 3 g__Ruminococcaceae_unclassified             4.75 gut                   -4.41 0.000491172627448 lefse_galaxy
## 4 g__Lachnospira                              4.68 gut                   -4.38 0.000176227698098 lefse_galaxy
## 5 g__Phascolarctobacterium                    4.46 gut                   -4.18 0.000176227698098 lefse_galaxy
## 6 g__Faecalibacterium                         4.47 gut                   -4.18 0.000176227698098 lefse_galaxy

10.8.3.4 Extracting results from XMAS2 results

  • run_lefse (lefser R package)

  • run_lefse2 (microbiomeMarker R package)

get_lefse_R <- function(datres,
                        name = "Rrun_lefse",
                        LDA_names = "lefse_R_LDA",
                        LDA_cutoff = 2) {
    
    # datres = amplicon_xmas2_output
    # name = "Rrun_lefse"
    # LDA_cutoff = 2
    
    col_names <- c(
        "TaxaID", "Block", "Enrichment", "LDA_Score", "EffectSize")
    lefse_R <- datres %>% 
        dplyr::select(all_of(col_names)) %>% 
        dplyr::mutate(app_name = name) %>% 
        dplyr::filter(abs(LDA_Score) >= LDA_cutoff) %>%
        dplyr::arrange(LDA_Score)
    
    colnames(lefse_R)[which(colnames(lefse_R) == "LDA_Score")] <- LDA_names
    
    return(lefse_R)
}

amplicon_ps_genus_lefse_R <- get_lefse_R(
                        datres = amplicon_xmas2_output, 
                        name = "Rrun_lefse",
                        LDA_names = "lefse_R_LDA")
head(amplicon_ps_genus_lefse_R)
##                            TaxaID             Block Enrichment lefse_R_LDA EffectSize   app_name
## 1                  g__Bacteroides 8_gut vs 9_tongue        gut   -5.475334   5.734982 Rrun_lefse
## 2 g__Lachnospiraceae_unclassified 8_gut vs 9_tongue        gut   -4.445346   2.296951 Rrun_lefse
## 3 g__Ruminococcaceae_unclassified 8_gut vs 9_tongue        gut   -4.430034   5.962548 Rrun_lefse
## 4                  g__Lachnospira 8_gut vs 9_tongue        gut   -4.327495   6.503718 Rrun_lefse
## 5        g__Phascolarctobacterium 8_gut vs 9_tongue        gut   -4.214235   6.206589 Rrun_lefse
## 6             g__Faecalibacterium 8_gut vs 9_tongue        gut   -4.136991   5.492180 Rrun_lefse
amplicon_ps_genus_lefse_R2 <- get_lefse_R(
                        datres = amplicon_xmas2_output2, 
                        name = "Rrun_lefse2",
                        LDA_names = "lefse_R2_LDA")
head(amplicon_ps_genus_lefse_R2)
##                            TaxaID             Block Enrichment lefse_R2_LDA EffectSize    app_name
## 1                  g__Bacteroides 8_gut vs 9_tongue        gut    -5.763040   5.734982 Rrun_lefse2
## 2 g__Lachnospiraceae_unclassified 8_gut vs 9_tongue        gut    -4.774847   2.296951 Rrun_lefse2
## 3 g__Ruminococcaceae_unclassified 8_gut vs 9_tongue        gut    -4.756686   5.962548 Rrun_lefse2
## 4                  g__Lachnospira 8_gut vs 9_tongue        gut    -4.684106   6.503718 Rrun_lefse2
## 5             g__Faecalibacterium 8_gut vs 9_tongue        gut    -4.447011   5.492180 Rrun_lefse2
## 6        g__Phascolarctobacterium 8_gut vs 9_tongue        gut    -4.441230   6.206589 Rrun_lefse2
Zeybel_ps_species_lefse_R <- get_lefse_R(
                        datres = MGS_xmas2_output, 
                        name = "Rrun_lefse",
                        LDA_names = "lefse_R_LDA")
head(Zeybel_ps_species_lefse_R)
##                            TaxaID                  Block Enrichment lefse_R_LDA EffectSize   app_name
## 1         s__Butyricimonas_virosa 12_Mild vs 12_Moderate   Moderate    2.445966   1.668501 Rrun_lefse
## 2       s__Bacteroides_salyersiae 12_Mild vs 12_Moderate   Moderate    2.659689   1.573816 Rrun_lefse
## 3           s__Bacteroides_clarus 12_Mild vs 12_Moderate   Moderate    3.191513   2.331714 Rrun_lefse
## 4 s__Bacteroides_thetaiotaomicron 12_Mild vs 12_Moderate   Moderate    3.210748   2.775490 Rrun_lefse
## 5        s__Bacteroides_coprocola 12_Mild vs 12_Moderate   Moderate    3.314238   1.367262 Rrun_lefse
## 6 s__Bacteroides_cellulosilyticus 12_Mild vs 12_Moderate   Moderate    3.408064   3.145457 Rrun_lefse
Zeybel_ps_species_lefse_R2 <- get_lefse_R(
                        datres = MGS_xmas2_output2, 
                        name = "Rrun_lefse2",
                        LDA_names = "lefse_R2_LDA")
head(Zeybel_ps_species_lefse_R2)
##                            TaxaID                  Block Enrichment lefse_R2_LDA EffectSize    app_name
## 1         s__Butyricimonas_virosa 12_Mild vs 12_Moderate   Moderate     2.709350   1.668501 Rrun_lefse2
## 2       s__Bacteroides_salyersiae 12_Mild vs 12_Moderate   Moderate     2.763901   1.573816 Rrun_lefse2
## 3           s__Bacteroides_clarus 12_Mild vs 12_Moderate   Moderate     3.354092   2.331714 Rrun_lefse2
## 4 s__Bacteroides_thetaiotaomicron 12_Mild vs 12_Moderate   Moderate     3.502262   2.775490 Rrun_lefse2
## 5     s__Bacteroides_intestinalis 12_Mild vs 12_Moderate   Moderate     3.524676   1.184353 Rrun_lefse2
## 6        s__Bacteroides_coprocola 12_Mild vs 12_Moderate   Moderate     3.538227   1.367262 Rrun_lefse2

10.8.4 Comparison of lefse-conda with XMAS2

10.8.4.1 Number of features reported as significant

  • amplicon_ps_genus_lefse
plot_signif_taxa_num <- function(dat1, dat2, dat3, dat4) {

    # dat1 = amplicon_ps_genus_lefse_conda
    # dat2 = amplicon_ps_genus_lefse_galaxy
    # dat3 = amplicon_ps_genus_lefse_R
    # dat4 = amplicon_ps_genus_lefse_R2
                     
    combined_outputs <- dplyr::bind_rows(dat1, dat2, dat3, dat4) %>% 
       dplyr::mutate(LDA = coalesce(lefse_conda_LDA, 
                                    lefse_galaxy_LDA, 
                                    lefse_R_LDA, 
                                    lefse_R2_LDA))
    
    pl <- combined_outputs %>% 
        dplyr::count(app_name) %>% 
        ggplot(aes(app_name, n)) +
        geom_col() +
        geom_label(aes(label = n)) +
        ggtitle('Number of significiant features identified by the different applications using lefse') 
    
    return(pl)
}

plot_signif_taxa_num(dat1 = amplicon_ps_genus_lefse_conda, 
                     dat2 = amplicon_ps_genus_lefse_galaxy, 
                     dat3 = amplicon_ps_genus_lefse_R, 
                     dat4 = amplicon_ps_genus_lefse_R2)
Number of significiant features identified by the different applications using lefse (16s)

Figure 10.8: Number of significiant features identified by the different applications using lefse (16s)

  • Zeybel_ps_species_lefse
plot_signif_taxa_num(dat1 = Zeybel_ps_species_lefse_conda, 
                     dat2 = Zeybel_ps_species_lefse_galaxy, 
                     dat3 = Zeybel_ps_species_lefse_R, 
                     dat4 = Zeybel_ps_species_lefse_R2)
Number of significiant features identified by the different applications using lefse (MGS)

Figure 10.9: Number of significiant features identified by the different applications using lefse (MGS)

10.8.4.2 Overlap of features reported as significant

  • amplicon_ps_genus_lefse
plot_signif_taxa_venn <- function(dat1, dat2, dat3, dat4) {
    
    # dat1 = amplicon_ps_genus_lefse_conda
    # dat2 = amplicon_ps_genus_lefse_galaxy
    # dat3 = amplicon_ps_genus_lefse_R
    # dat4 = amplicon_ps_genus_lefse_R2
    
    set1 = dat1$TaxaID
    set2 = dat2$TaxaID
    set3 = dat3$TaxaID
    set4 = dat4$TaxaID
    
    grid.newpage()
    venn_object <- venn.diagram(
        x = list(set1, set2, set3, set4),
        category.names = c("lefse-conda", "lefse-galaxy", 
                           "run_lefse(lefser)", "run_lefse2(microbiomeMarker)"),
        filename = NULL
    )
    grid.draw(venn_object)    
}

plot_signif_taxa_venn(dat1 = amplicon_ps_genus_lefse_conda, 
                      dat2 = amplicon_ps_genus_lefse_galaxy, 
                      dat3 = amplicon_ps_genus_lefse_R, 
                      dat4 = amplicon_ps_genus_lefse_R2)
Venn (16s)

Figure 10.10: Venn (16s)

  • Zeybel_ps_species_lefse
plot_signif_taxa_venn(dat1 = Zeybel_ps_species_lefse_conda, 
                      dat2 = Zeybel_ps_species_lefse_galaxy, 
                      dat3 = Zeybel_ps_species_lefse_R, 
                      dat4 = Zeybel_ps_species_lefse_R2)
Venn (MGS)

Figure 10.11: Venn (MGS)

10.8.4.3 LDA scores’ comparison

LDA scores of the 14 overlapping features are similar.

  • amplicon_ps_genus_lefse
amplicon_joint_output <- purrr::reduce(
    .x = list(amplicon_ps_genus_lefse_conda, amplicon_ps_genus_lefse_galaxy, 
              amplicon_ps_genus_lefse_R, amplicon_ps_genus_lefse_R2),
    .f = ~ inner_join(.x, .y, by = "TaxaID")) %>% 
    dplyr::select(TaxaID, lefse_conda_LDA, lefse_galaxy_LDA, 
                  lefse_R_LDA, lefse_R2_LDA)
amplicon_joint_output
## # A tibble: 57 × 5
##    TaxaID                          lefse_conda_LDA lefse_galaxy_LDA lefse_R_LDA lefse_R2_LDA
##    <chr>                                     <dbl>            <dbl>       <dbl>        <dbl>
##  1 g__Bacteroides                            -5.47            -5.46       -5.48        -5.76
##  2 g__Lachnospiraceae_unclassified           -4.47            -4.45       -4.45        -4.77
##  3 g__Ruminococcaceae_unclassified           -4.46            -4.41       -4.43        -4.76
##  4 g__Lachnospira                            -4.36            -4.38       -4.33        -4.68
##  5 g__Faecalibacterium                       -4.20            -4.18       -4.14        -4.45
##  6 g__Phascolarctobacterium                  -4.15            -4.18       -4.21        -4.44
##  7 g__Clostridiales_unclassified             -4.03            -4.06       -4.02        -4.37
##  8 g__Akkermansia                            -3.91            -4.04       -4.08        -4.22
##  9 g__Oscillospira                           -3.87            -3.89       -3.87        -4.20
## 10 g__Rikenellaceae_unclassified             -3.87            -3.91       -3.85        -4.27
## # … with 47 more rows
  • Zeybel_ps_species_lefse
MGS_joint_output <- purrr::reduce(
    .x = list(Zeybel_ps_species_lefse_conda, Zeybel_ps_species_lefse_galaxy, 
              Zeybel_ps_species_lefse_R, Zeybel_ps_species_lefse_R2),
    .f = ~ inner_join(.x, .y, by = "TaxaID")) %>% 
    dplyr::select(TaxaID, lefse_conda_LDA, lefse_galaxy_LDA, 
                  lefse_R_LDA, lefse_R2_LDA)
MGS_joint_output
## # A tibble: 11 × 5
##    TaxaID                          lefse_conda_LDA lefse_galaxy_LDA lefse_R_LDA lefse_R2_LDA
##    <chr>                                     <dbl>            <dbl>       <dbl>        <dbl>
##  1 s__Butyricimonas_virosa                    2.46             2.82        2.45         2.71
##  2 s__Bacteroides_salyersiae                  2.47             3.16        2.66         2.76
##  3 s__Bacteroides_clarus                      3.07             3.29        3.19         3.35
##  4 s__Bacteroides_thetaiotaomicron            3.26             3.28        3.21         3.50
##  5 s__Bacteroides_coprocola                   3.28             3.24        3.31         3.54
##  6 s__Bacteroides_intestinalis                3.32             3.31        3.41         3.52
##  7 s__Bacteroides_cellulosilyticus            3.41             3.36        3.41         3.71
##  8 s__Bacteroides_eggerthii                   3.52             3.59        3.47         3.80
##  9 s__Parabacteroides_merdae                  3.59             3.68        3.54         3.93
## 10 s__Barnesiella_intestinihominis            3.68             3.72        3.67         3.95
## 11 s__Prevotella_sp_CAG_520                   4.04             4.26        4.38         4.63
10.8.4.3.1 XMAS2 LDA scores vs lefse-conda LDA scores
10.8.4.3.1.1 amplicon_ps_genus_lefse
  • run_lefse (lefser R package) vs lefse-conda
amplicon_joint_output %>% 
    ggplot(aes(lefse_conda_LDA, lefse_R_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-conda and run_lefse")
Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse (16s)

Figure 10.12: Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse (16s)

  • run_lefse2 (microbiomeMarker R package) vs lefse-conda
amplicon_joint_output %>% 
    ggplot(aes(lefse_conda_LDA, lefse_R2_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-conda and run_lefse2")
Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse2 (16s)

Figure 10.13: Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse2 (16s)

10.8.4.3.1.2 Zeybel_ps_species_lefse
  • run_lefse (lefser R package) vs lefse-conda
MGS_joint_output %>% 
    ggplot(aes(lefse_conda_LDA, lefse_R_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-conda and run_lefse")
Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse (MGS)

Figure 10.14: Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse (MGS)

  • run_lefse2 (microbiomeMarker R package) vs lefse-conda
MGS_joint_output %>% 
    ggplot(aes(lefse_conda_LDA, lefse_R2_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-conda and run_lefse2")
Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse2 (MGS)

Figure 10.15: Comparison of LDA scores of features reported as significant by both lefse-conda and run_lefse2 (MGS)

10.8.4.3.2 XMAS2 LDA scores vs lefse-galaxy LDA scores
10.8.4.3.2.1 amplicon_ps_genus_lefse
  • run_lefse (lefser R package) vs lefse-galaxy
amplicon_joint_output %>% 
    ggplot(aes(lefse_galaxy_LDA, lefse_R_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-galaxy and run_lefse")
Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse (16s)

Figure 10.16: Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse (16s)

  • run_lefse2 (microbiomeMarker R package) vs lefse-galaxy
amplicon_joint_output %>% 
    ggplot(aes(lefse_galaxy_LDA, lefse_R2_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-galaxy and run_lefse2")
Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse2 (16s)

Figure 10.17: Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse2 (16s)

10.8.4.3.2.2 Zeybel_ps_species_lefse
  • run_lefse (lefser R package) vs lefse-galaxy
MGS_joint_output %>% 
    ggplot(aes(lefse_galaxy_LDA, lefse_R_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-galaxy and run_lefse")
Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse (MGS)

Figure 10.18: Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse (MGS)

  • run_lefse2 (microbiomeMarker R package) vs lefse-galaxy
MGS_joint_output %>% 
    ggplot(aes(lefse_galaxy_LDA, lefse_R2_LDA)) + 
    geom_point(size = 3, shape = 1) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    geom_vline(xintercept = 0, linetype = "dashed") +
    ggtitle("Comparison of LDA scores of features reported as significant 
    by both lefse-galaxy and run_lefse2")
Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse2 (MGS)

Figure 10.19: Comparison of LDA scores of features reported as significant by both lefse-galaxy and run_lefse2 (MGS)

Results:

  • The overlap between run_lefse2 (microbiomeMarker R package) and lefse-conda or lefse-galaxy have the similar LDA scores. However, the overlap between run_lefse (lefser R package) and lefse-conda or lefse-galaxy seem have slightly different LDA scores.
10.8.4.3.3 Differences bewteen XMAS2 LDA scores and lefse-conda
setdiff(amplicon_ps_genus_lefse_conda$TaxaID, amplicon_ps_genus_lefse_R$TaxaID)
## [1] "g__Holdemania"                  "g__Alcaligenaceae_unclassified"
setdiff(amplicon_ps_genus_lefse_conda$TaxaID, amplicon_ps_genus_lefse_R2$TaxaID)
## character(0)

10.9 Systematic Information

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       macOS Monterey 12.2.1
##  system   x86_64, darwin17.0
##  ui       RStudio
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Asia/Shanghai
##  date     2022-10-08
##  rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
##  pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date (UTC) lib source
##  abind                  1.4-5    2016-07-21 [1] CRAN (R 4.1.0)
##  ade4                   1.7-18   2021-09-16 [1] CRAN (R 4.1.0)
##  ALDEx2                 1.26.0   2021-10-26 [1] Bioconductor
##  annotate               1.72.0   2021-10-26 [1] Bioconductor
##  AnnotationDbi          1.56.2   2021-11-09 [1] Bioconductor
##  ape                  * 5.6-2    2022-03-02 [1] CRAN (R 4.1.2)
##  askpass                1.1      2019-01-13 [1] CRAN (R 4.1.0)
##  assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.0)
##  backports              1.4.1    2021-12-13 [1] CRAN (R 4.1.0)
##  base64enc              0.1-3    2015-07-28 [1] CRAN (R 4.1.0)
##  bayesm                 3.1-4    2019-10-15 [1] CRAN (R 4.1.0)
##  Biobase              * 2.54.0   2021-10-26 [1] Bioconductor
##  BiocGenerics         * 0.40.0   2021-10-26 [1] Bioconductor
##  BiocParallel           1.28.3   2021-12-09 [1] Bioconductor
##  biomformat             1.22.0   2021-10-26 [1] Bioconductor
##  Biostrings             2.62.0   2021-10-26 [1] Bioconductor
##  bit                    4.0.4    2020-08-04 [1] CRAN (R 4.1.0)
##  bit64                  4.0.5    2020-08-30 [1] CRAN (R 4.1.0)
##  bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
##  blob                   1.2.2    2021-07-23 [1] CRAN (R 4.1.0)
##  bookdown               0.27     2022-06-14 [1] CRAN (R 4.1.2)
##  brio                   1.1.3    2021-11-30 [1] CRAN (R 4.1.0)
##  broom                  1.0.0    2022-07-01 [1] CRAN (R 4.1.2)
##  bslib                  0.3.1    2021-10-06 [1] CRAN (R 4.1.0)
##  cachem                 1.0.6    2021-08-19 [1] CRAN (R 4.1.0)
##  callr                  3.7.0    2021-04-20 [1] CRAN (R 4.1.0)
##  car                    3.0-12   2021-11-06 [1] CRAN (R 4.1.0)
##  carData                3.0-5    2022-01-06 [1] CRAN (R 4.1.2)
##  caTools                1.18.2   2021-03-28 [1] CRAN (R 4.1.0)
##  checkmate              2.0.0    2020-02-06 [1] CRAN (R 4.1.0)
##  class                  7.3-20   2022-01-13 [1] CRAN (R 4.1.2)
##  classInt               0.4-3    2020-04-07 [1] CRAN (R 4.1.0)
##  cli                    3.3.0    2022-04-25 [1] CRAN (R 4.1.2)
##  cluster                2.1.2    2021-04-17 [1] CRAN (R 4.1.2)
##  codetools              0.2-18   2020-11-04 [1] CRAN (R 4.1.2)
##  coin                   1.4-2    2021-10-08 [1] CRAN (R 4.1.0)
##  colorspace             2.0-3    2022-02-21 [1] CRAN (R 4.1.2)
##  compositions           2.0-4    2022-01-05 [1] CRAN (R 4.1.2)
##  conflicted           * 1.1.0    2021-11-26 [1] CRAN (R 4.1.0)
##  corrplot               0.92     2021-11-18 [1] CRAN (R 4.1.0)
##  cowplot                1.1.1    2020-12-30 [1] CRAN (R 4.1.0)
##  crayon                 1.5.0    2022-02-14 [1] CRAN (R 4.1.2)
##  crosstalk              1.2.0    2021-11-04 [1] CRAN (R 4.1.0)
##  data.table             1.14.2   2021-09-27 [1] CRAN (R 4.1.0)
##  DBI                    1.1.2    2021-12-20 [1] CRAN (R 4.1.0)
##  DelayedArray           0.20.0   2021-10-26 [1] Bioconductor
##  DEoptimR               1.0-10   2022-01-03 [1] CRAN (R 4.1.2)
##  desc                   1.4.1    2022-03-06 [1] CRAN (R 4.1.2)
##  DESeq2                 1.34.0   2021-10-26 [1] Bioconductor
##  devtools             * 2.4.3    2021-11-30 [1] CRAN (R 4.1.0)
##  digest                 0.6.29   2021-12-01 [1] CRAN (R 4.1.0)
##  dplyr                * 1.0.8    2022-02-08 [1] CRAN (R 4.1.2)
##  DT                     0.21     2022-02-26 [1] CRAN (R 4.1.2)
##  e1071                  1.7-9    2021-09-16 [1] CRAN (R 4.1.0)
##  edgeR                  3.36.0   2021-10-26 [1] Bioconductor
##  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
##  evaluate               0.15     2022-02-18 [1] CRAN (R 4.1.2)
##  FactoMineR             2.4      2020-12-11 [1] CRAN (R 4.1.0)
##  fansi                  1.0.2    2022-01-14 [1] CRAN (R 4.1.2)
##  farver                 2.1.0    2021-02-28 [1] CRAN (R 4.1.0)
##  fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
##  flashClust             1.01-2   2012-08-21 [1] CRAN (R 4.1.0)
##  foreach                1.5.2    2022-02-02 [1] CRAN (R 4.1.2)
##  foreign                0.8-82   2022-01-13 [1] CRAN (R 4.1.2)
##  forestplot             2.0.1    2021-09-03 [1] CRAN (R 4.1.0)
##  formatR                1.11     2021-06-01 [1] CRAN (R 4.1.0)
##  Formula                1.2-4    2020-10-16 [1] CRAN (R 4.1.0)
##  fs                     1.5.2    2021-12-08 [1] CRAN (R 4.1.0)
##  futile.logger        * 1.4.3    2016-07-10 [1] CRAN (R 4.1.0)
##  futile.options         1.0.1    2018-04-20 [1] CRAN (R 4.1.0)
##  genefilter             1.76.0   2021-10-26 [1] Bioconductor
##  geneplotter            1.72.0   2021-10-26 [1] Bioconductor
##  generics               0.1.2    2022-01-31 [1] CRAN (R 4.1.2)
##  GenomeInfoDb         * 1.30.1   2022-01-30 [1] Bioconductor
##  GenomeInfoDbData       1.2.7    2022-03-09 [1] Bioconductor
##  GenomicRanges        * 1.46.1   2021-11-18 [1] Bioconductor
##  ggiraph                0.8.2    2022-02-22 [1] CRAN (R 4.1.2)
##  ggiraphExtra           0.3.0    2020-10-06 [1] CRAN (R 4.1.2)
##  ggplot2              * 3.3.5    2021-06-25 [1] CRAN (R 4.1.0)
##  ggpubr               * 0.4.0    2020-06-27 [1] CRAN (R 4.1.0)
##  ggrepel                0.9.1    2021-01-15 [1] CRAN (R 4.1.0)
##  ggsci                  2.9      2018-05-14 [1] CRAN (R 4.1.0)
##  ggsignif               0.6.3    2021-09-09 [1] CRAN (R 4.1.0)
##  ggVennDiagram          1.2.1    2022-04-13 [1] Github (gaospecial/ggVennDiagram@db6742d)
##  glmnet                 4.1-3    2021-11-02 [1] CRAN (R 4.1.0)
##  glue                 * 1.6.2    2022-02-24 [1] CRAN (R 4.1.2)
##  Gmisc                * 3.0.0    2022-01-03 [1] CRAN (R 4.1.2)
##  gplots                 3.1.1    2020-11-28 [1] CRAN (R 4.1.0)
##  gridExtra              2.3      2017-09-09 [1] CRAN (R 4.1.0)
##  gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.1.0)
##  gtools                 3.9.2    2021-06-06 [1] CRAN (R 4.1.0)
##  highr                  0.9      2021-04-16 [1] CRAN (R 4.1.0)
##  Hmisc                  4.6-0    2021-10-07 [1] CRAN (R 4.1.0)
##  hms                    1.1.1    2021-09-26 [1] CRAN (R 4.1.0)
##  htmlTable            * 2.4.0    2022-01-04 [1] CRAN (R 4.1.2)
##  htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.1.0)
##  htmlwidgets            1.5.4    2021-09-08 [1] CRAN (R 4.1.0)
##  httr                   1.4.2    2020-07-20 [1] CRAN (R 4.1.0)
##  igraph                 1.2.11   2022-01-04 [1] CRAN (R 4.1.2)
##  insight                0.17.0   2022-03-29 [1] CRAN (R 4.1.2)
##  IRanges              * 2.28.0   2021-10-26 [1] Bioconductor
##  iterators              1.0.14   2022-02-05 [1] CRAN (R 4.1.2)
##  jpeg                   0.1-9    2021-07-24 [1] CRAN (R 4.1.0)
##  jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.1.0)
##  jsonlite               1.8.0    2022-02-22 [1] CRAN (R 4.1.2)
##  kableExtra             1.3.4    2021-02-20 [1] CRAN (R 4.1.2)
##  KEGGREST               1.34.0   2021-10-26 [1] Bioconductor
##  KernSmooth             2.23-20  2021-05-03 [1] CRAN (R 4.1.2)
##  knitr                  1.39     2022-04-26 [1] CRAN (R 4.1.2)
##  labeling               0.4.2    2020-10-20 [1] CRAN (R 4.1.0)
##  lambda.r               1.2.4    2019-09-18 [1] CRAN (R 4.1.0)
##  lattice              * 0.20-45  2021-09-22 [1] CRAN (R 4.1.2)
##  latticeExtra           0.6-29   2019-12-19 [1] CRAN (R 4.1.0)
##  leaps                  3.1      2020-01-16 [1] CRAN (R 4.1.0)
##  libcoin                1.0-9    2021-09-27 [1] CRAN (R 4.1.0)
##  lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.1.0)
##  limma                  3.50.1   2022-02-17 [1] Bioconductor
##  locfit                 1.5-9.5  2022-03-03 [1] CRAN (R 4.1.2)
##  LOCOM                  1.1      2022-08-05 [1] Github (yijuanhu/LOCOM@c181e0f)
##  lubridate              1.8.0    2021-10-07 [1] CRAN (R 4.1.0)
##  magrittr             * 2.0.2    2022-01-26 [1] CRAN (R 4.1.2)
##  MASS                   7.3-55   2022-01-13 [1] CRAN (R 4.1.2)
##  Matrix                 1.4-0    2021-12-08 [1] CRAN (R 4.1.0)
##  MatrixGenerics       * 1.6.0    2021-10-26 [1] Bioconductor
##  matrixStats          * 0.61.0   2021-09-17 [1] CRAN (R 4.1.0)
##  mbzinb                 0.2      2022-03-16 [1] local
##  memoise                2.0.1    2021-11-26 [1] CRAN (R 4.1.0)
##  metagenomeSeq          1.36.0   2021-10-26 [1] Bioconductor
##  mgcv                   1.8-39   2022-02-24 [1] CRAN (R 4.1.2)
##  microbiome             1.16.0   2021-10-26 [1] Bioconductor
##  modeltools             0.2-23   2020-03-05 [1] CRAN (R 4.1.0)
##  multcomp               1.4-18   2022-01-04 [1] CRAN (R 4.1.2)
##  multtest               2.50.0   2021-10-26 [1] Bioconductor
##  munsell                0.5.0    2018-06-12 [1] CRAN (R 4.1.0)
##  mvtnorm                1.1-3    2021-10-08 [1] CRAN (R 4.1.0)
##  mycor                  0.1.1    2018-04-10 [1] CRAN (R 4.1.0)
##  NADA                   1.6-1.1  2020-03-22 [1] CRAN (R 4.1.0)
##  nlme                 * 3.1-155  2022-01-13 [1] CRAN (R 4.1.2)
##  nnet                   7.3-17   2022-01-13 [1] CRAN (R 4.1.2)
##  openssl                2.0.0    2022-03-02 [1] CRAN (R 4.1.2)
##  permute              * 0.9-7    2022-01-27 [1] CRAN (R 4.1.2)
##  pheatmap             * 1.0.12   2019-01-04 [1] CRAN (R 4.1.0)
##  phyloseq             * 1.38.0   2021-10-26 [1] Bioconductor
##  picante              * 1.8.2    2020-06-10 [1] CRAN (R 4.1.0)
##  pillar                 1.7.0    2022-02-01 [1] CRAN (R 4.1.2)
##  pkgbuild               1.3.1    2021-12-20 [1] CRAN (R 4.1.0)
##  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
##  pkgload                1.2.4    2021-11-30 [1] CRAN (R 4.1.0)
##  plyr                   1.8.6    2020-03-03 [1] CRAN (R 4.1.0)
##  png                    0.1-7    2013-12-03 [1] CRAN (R 4.1.0)
##  ppcor                  1.1      2015-12-03 [1] CRAN (R 4.1.0)
##  prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.1.0)
##  processx               3.5.2    2021-04-30 [1] CRAN (R 4.1.0)
##  protoclust             1.6.3    2019-01-31 [1] CRAN (R 4.1.0)
##  proxy                  0.4-26   2021-06-07 [1] CRAN (R 4.1.0)
##  ps                     1.6.0    2021-02-28 [1] CRAN (R 4.1.0)
##  pscl                   1.5.5    2020-03-07 [1] CRAN (R 4.1.0)
##  purrr                * 0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
##  qvalue                 2.26.0   2021-10-26 [1] Bioconductor
##  R6                     2.5.1    2021-08-19 [1] CRAN (R 4.1.0)
##  RAIDA                  1.0      2022-03-14 [1] local
##  RColorBrewer         * 1.1-2    2014-12-07 [1] CRAN (R 4.1.0)
##  Rcpp                 * 1.0.8.2  2022-03-11 [1] CRAN (R 4.1.2)
##  RcppZiggurat           0.1.6    2020-10-20 [1] CRAN (R 4.1.0)
##  RCurl                  1.98-1.6 2022-02-08 [1] CRAN (R 4.1.2)
##  readr                * 2.1.2    2022-01-30 [1] CRAN (R 4.1.2)
##  remotes                2.4.2    2021-11-30 [1] CRAN (R 4.1.0)
##  reshape2               1.4.4    2020-04-09 [1] CRAN (R 4.1.0)
##  reticulate             1.24     2022-01-26 [1] CRAN (R 4.1.2)
##  Rfast                  2.0.6    2022-02-16 [1] CRAN (R 4.1.2)
##  rhdf5                  2.38.1   2022-03-10 [1] Bioconductor
##  rhdf5filters           1.6.0    2021-10-26 [1] Bioconductor
##  Rhdf5lib               1.16.0   2021-10-26 [1] Bioconductor
##  rlang                  1.0.2    2022-03-04 [1] CRAN (R 4.1.2)
##  rmarkdown              2.14     2022-04-25 [1] CRAN (R 4.1.2)
##  robustbase             0.93-9   2021-09-27 [1] CRAN (R 4.1.0)
##  rpart                  4.1.16   2022-01-24 [1] CRAN (R 4.1.2)
##  rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.1.0)
##  RSpectra               0.16-0   2019-12-01 [1] CRAN (R 4.1.0)
##  RSQLite                2.2.10   2022-02-17 [1] CRAN (R 4.1.2)
##  rstatix                0.7.0    2021-02-13 [1] CRAN (R 4.1.0)
##  rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)
##  Rtsne                  0.15     2018-11-10 [1] CRAN (R 4.1.0)
##  RVenn                  1.1.0    2019-07-18 [1] CRAN (R 4.1.0)
##  rvest                  1.0.2    2021-10-16 [1] CRAN (R 4.1.0)
##  S4Vectors            * 0.32.3   2021-11-21 [1] Bioconductor
##  sandwich               3.0-1    2021-05-18 [1] CRAN (R 4.1.0)
##  sass                   0.4.0    2021-05-12 [1] CRAN (R 4.1.0)
##  scales                 1.1.1    2020-05-11 [1] CRAN (R 4.1.0)
##  scatterplot3d          0.3-41   2018-03-14 [1] CRAN (R 4.1.0)
##  sessioninfo            1.2.2    2021-12-06 [1] CRAN (R 4.1.0)
##  sf                     1.0-7    2022-03-07 [1] CRAN (R 4.1.2)
##  shape                  1.4.6    2021-05-19 [1] CRAN (R 4.1.0)
##  sjlabelled             1.2.0    2022-04-10 [1] CRAN (R 4.1.2)
##  sjmisc                 2.8.9    2021-12-03 [1] CRAN (R 4.1.0)
##  stringi                1.7.6    2021-11-29 [1] CRAN (R 4.1.0)
##  stringr                1.4.0    2019-02-10 [1] CRAN (R 4.1.0)
##  SummarizedExperiment * 1.24.0   2021-10-26 [1] Bioconductor
##  survival               3.3-1    2022-03-03 [1] CRAN (R 4.1.2)
##  svglite                2.1.0    2022-02-03 [1] CRAN (R 4.1.2)
##  systemfonts            1.0.4    2022-02-11 [1] CRAN (R 4.1.2)
##  tensorA                0.36.2   2020-11-19 [1] CRAN (R 4.1.0)
##  testthat               3.1.2    2022-01-20 [1] CRAN (R 4.1.2)
##  TH.data                1.1-0    2021-09-27 [1] CRAN (R 4.1.0)
##  tibble               * 3.1.6    2021-11-07 [1] CRAN (R 4.1.0)
##  tidyr                * 1.2.0    2022-02-01 [1] CRAN (R 4.1.2)
##  tidyselect             1.1.2    2022-02-21 [1] CRAN (R 4.1.2)
##  truncnorm              1.0-8    2018-02-27 [1] CRAN (R 4.1.0)
##  tzdb                   0.2.0    2021-10-27 [1] CRAN (R 4.1.0)
##  umap                   0.2.8.0  2022-03-23 [1] CRAN (R 4.1.2)
##  units                  0.8-0    2022-02-05 [1] CRAN (R 4.1.2)
##  usethis              * 2.1.5    2021-12-09 [1] CRAN (R 4.1.0)
##  utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.0)
##  uuid                   1.0-3    2021-11-01 [1] CRAN (R 4.1.0)
##  vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
##  vegan                * 2.5-7    2020-11-28 [1] CRAN (R 4.1.0)
##  VennDiagram          * 1.7.3    2022-04-12 [1] CRAN (R 4.1.2)
##  viridis              * 0.6.2    2021-10-13 [1] CRAN (R 4.1.0)
##  viridisLite          * 0.4.0    2021-04-13 [1] CRAN (R 4.1.0)
##  vroom                  1.5.7    2021-11-30 [1] CRAN (R 4.1.0)
##  webshot                0.5.3    2022-04-14 [1] CRAN (R 4.1.2)
##  withr                  2.5.0    2022-03-03 [1] CRAN (R 4.1.2)
##  Wrench                 1.12.0   2021-10-26 [1] Bioconductor
##  xfun                   0.30     2022-03-02 [1] CRAN (R 4.1.2)
##  XMAS2                * 2.1.8    2022-10-08 [1] local
##  XML                    3.99-0.9 2022-02-24 [1] CRAN (R 4.1.2)
##  xml2                   1.3.3    2021-11-30 [1] CRAN (R 4.1.0)
##  xtable                 1.8-4    2019-04-21 [1] CRAN (R 4.1.0)
##  XVector                0.34.0   2021-10-26 [1] Bioconductor
##  yaml                   2.3.5    2022-02-21 [1] CRAN (R 4.1.2)
##  zCompositions          1.4.0    2022-01-13 [1] CRAN (R 4.1.2)
##  zlibbioc               1.40.0   2021-10-26 [1] Bioconductor
##  zoo                    1.8-9    2021-03-09 [1] CRAN (R 4.1.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

References

Chen, Jun, Emily King, Rebecca Deek, Zhi Wei, Yue Yu, Diane Grill, and Karla Ballman. 2018. “An Omnibus Test for Differential Distribution Analysis of Microbiome Sequencing Data.” Bioinformatics 34 (4): 643–51.
Chen, Li, James Reeve, Lujun Zhang, Shengbing Huang, Xuefeng Wang, and Jun Chen. 2018. “GMPR: A Robust Normalization Method for Zero-Inflated Count Data with Application to Microbiome Sequencing Data.” PeerJ 6: e4600.
Fernandes, Andrew D, Jennifer Ns Reid, Jean M Macklaim, Thomas A McMurrough, David R Edgell, and Gregory B Gloor. 2014. “Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16s rRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis.” Microbiome 2 (1): 1–13.
Hu, Yingtian, Glen A Satten, and Yi-Juan Hu. 2022. “LOCOM: A Logistic Regression Model for Testing Differential Abundance in Compositional Microbiome Data with False Discovery Rate Control.” Proceedings of the National Academy of Sciences 119 (30): e2122788119.
Law, Charity W, Yunshun Chen, Wei Shi, and Gordon K Smyth. 2014. “Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts.” Genome Biology 15 (2): 1–17.
Love, Michael I, Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 1–21.
Mallick, Himel, Ali Rahnavard, Lauren J McIver, Siyuan Ma, Yancong Zhang, Long H Nguyen, Timothy L Tickle, et al. 2021. “Multivariable Association Discovery in Population-Scale Meta-Omics Studies.” PLoS Computational Biology 17 (11): e1009442.
Mandal, Siddhartha, Will Van Treuren, Richard A White, Merete Eggesbø, Rob Knight, and Shyamal D Peddada. 2015. “Analysis of Composition of Microbiomes: A Novel Method for Studying Microbial Composition.” Microbial Ecology in Health and Disease 26 (1): 27663.
Martin, Bryan D, Daniela Witten, and Amy D Willis. 2020. “Modeling Microbial Abundances and Dysbiosis with Beta-Binomial Regression.” The Annals of Applied Statistics 14 (1): 94.
Nearing, Jacob T, Gavin M Douglas, Molly G Hayes, Jocelyn MacDonald, Dhwani K Desai, Nicole Allward, Casey Jones, et al. 2022. “Microbiome Differential Abundance Methods Produce Different Results Across 38 Datasets.” Nature Communications 13 (1): 1–16.
Paulson, Joseph N, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. 2013. “Differential Abundance Analysis for Microbial Marker-Gene Surveys.” Nature Methods 10 (12): 1200–1202.
Robinson, Mark D, and Alicia Oshlack. 2010. “A Scaling Normalization Method for Differential Expression Analysis of RNA-Seq Data.” Genome Biology 11 (3): 1–9.
Segata, Nicola, Jacques Izard, Levi Waldron, Dirk Gevers, Larisa Miropolsky, Wendy S Garrett, and Curtis Huttenhower. 2011. “Metagenomic Biomarker Discovery and Explanation.” Genome Biology 12 (6): 1–18.
Sohn, Michael B, Ruofei Du, and Lingling An. 2015. “A Robust Approach for Identifying Differentially Abundant Features in Metagenomic Samples.” Bioinformatics 31 (14): 2269–75.