Chapter 14 Examples
Here, we give users two examples to practice the data analysis workflow by XMAS 2.0. By the way, we also recommend users handling your own microbiota data in a reasonable manner when you utilize this package or workflow. Pay attention to whether your data fit the methods this package provided.
14.1 Loading Packages
library(XMAS2)
library(dplyr)
library(tibble)
library(phyloseq)
library(ggplot2)
library(ggpubr)
library(SummarizedExperiment)
14.2 Workflow description
The standard data analysis of 16S and MGS data by XMAS 2.0.
- 16S
Figure 14.1: Functions of XMAS 2.0 in 16s
- MGS
Figure 14.2: Functions of XMAS 2.0 in MGS
14.3 Amplicon sequencing (16s)
The upstream process is performed by in-house pipeline. and this example just shows how to perform downstream data analysis. In briefly, the example comprises the following steps:
Converting inputs into phyloseq object;
Quality Evaluation;
Pre-Processing Data;
Diversity analysis;
Ordination analysis;
Composition analysis;
Differential analysis.
14.3.1 Converting inputs into phyloseq-class object
dada2 result from standardized_analytics_workflow_R_function.
/home/xuxiaomin/project/standardized_analytics_workflow_R_function/demo_data/16S/process/xdada2/dada2_res.rds
/home/xuxiaomin/project/standardized_analytics_workflow_R_function/demo_data/16S/process/fasta2tree/tree.nwk
/home/xuxiaomin/project/standardized_analytics_workflow_R_function/demo_data/16S/metadata.txt
# dada2 results from in-house 16s pipeline
<- readRDS(
dada2_res system.file(
"extdata", "dada2_res.rds",
package = "XMAS2"
)
)
# the metadata matches to dada2 result
<- read.table(
sam_tab system.file(
"extdata", "dada2_metadata.tsv",
package = "XMAS2"
),sep = "\t",
header = TRUE,
stringsAsFactors = FALSE
)
# tree file from dada2 reference data silva
<- phyloseq::read_tree(
tree system.file(
"extdata", "tree.nwk",
package = "XMAS2"
)
)
<- import_dada2_taxa(dada2_taxa = dada2_res$tax_tab)
tax_tab <- dada2_res$seq_tab
otu_tab <- sam_tab %>% tibble::column_to_rownames("seqID")
sam_tab
# Shouldn't use the Total Number as SampleID (wrong: 123456; right: X123456)
rownames(otu_tab) <- paste0("S", rownames(otu_tab))
rownames(sam_tab) <- paste0("S", rownames(sam_tab))
<- get_dada2_phyloseq(
dada2_ps seq_tab = otu_tab,
tax_tab = tax_tab,
sam_tab = sam_tab,
phy_tree = tree)
dada2_ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 896 taxa and 24 samples ]
## sample_data() Sample Data: [ 24 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 896 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 896 tips and 893 internal nodes ]
## refseq() DNAStringSet: [ 896 reference sequences ]
Here, the phyloseq object comprises five components (OTU Table, Sample Data, Taxonomy Table, Phylogenetic Tree and DNAStringSet).
14.3.1.1 Summarize phyloseq-class object
summarize_phyloseq(ps = dada2_ps)
## Compositional = NO2
## 1] Min. number of reads = 511812] Max. number of reads = 936223] Total number of reads = 15025374] Average number of reads = 62605.70833333335] Median number of reads = 619157] Sparsity = 0.8653738839285716] Any OTU sum to 1 or less? NO8] Number of singletons = 09] Percent of OTUs that are singletons
## (i.e. exactly one read detected across all samples)010] Number of sample variables are: 1Group2
## [[1]]
## [1] "1] Min. number of reads = 51181"
##
## [[2]]
## [1] "2] Max. number of reads = 93622"
##
## [[3]]
## [1] "3] Total number of reads = 1502537"
##
## [[4]]
## [1] "4] Average number of reads = 62605.7083333333"
##
## [[5]]
## [1] "5] Median number of reads = 61915"
##
## [[6]]
## [1] "7] Sparsity = 0.865373883928571"
##
## [[7]]
## [1] "6] Any OTU sum to 1 or less? NO"
##
## [[8]]
## [1] "8] Number of singletons = 0"
##
## [[9]]
## [1] "9] Percent of OTUs that are singletons\n (i.e. exactly one read detected across all samples)0"
##
## [[10]]
## [1] "10] Number of sample variables are: 1"
##
## [[11]]
## [1] "Group"
The minus account of the OTU counts is 51181 in the phyloseq object, and we can use it as the threshold to rarefy.
Notice the Sparsity (0.865), indicating the data has many zeros and pay attention to the downstream data analysis. A common property of amplicon based microbiota data generated by sequencing.
14.3.2 Quality Control
Quality control of DADA2 results will help us have more rational determinations on the further data analysis.
14.3.2.1 Reads’ track by DADA2
plot_Dada2Track(data = dada2_res$reads_track)

Figure 14.3: Reads’ track by DADA2 (16s example)
The percentage of the final remained read counts approximate 70%, indicating that we should consider the sequence depth for analysis when we build the sequence library.
14.3.2.2 Spike-in sample (BRS) assessment
- Extract the genus level phyloseq and getting the BRS_ID
<- summarize_taxa(ps = dada2_ps,
dada2_ps_genus taxa_level = "Genus")
sample_data(dada2_ps_genus)
## Group
## S6030 BB
## S6032 BB
## S6033 BB
## S6035 AA
## S6036 BB
## S6037 AA
## S6040 BB
## S6043 AA
## S6045 BB
## S6046 BB
## S6048 BB
## S6049 AA
## S6050 BB
## S6054 BB
## S6055 BB
## S6058 BB
## S6059 AA
## S6060 AA
## S6061 AA
## S6063 BB
## S6065 AA
## S6066 AA
## S6068 BB
## S8005 QC
The BRS_ID is S8005 .
- Run
run_RefCheck
run_RefCheck(
ps = dada2_ps_genus,
BRS_ID = "S8005", Ref_type = "16s")
## Noting: the Reference Matrix is for 16s
## S8005 is in the Reference Matrix's samples and remove it to run
##
## ############Matched baterica of the BRS sample#############
## The number of BRS' bacteria matched the Reference Matrix is [15]
## g__Bifidobacterium
## g__Bacteroides
## g__Faecalibacterium
## g__Lactobacillus
## g__Parabacteroides
## g__Collinsella
## g__Coprococcus_3
## g__Dorea
## g__Streptococcus
## g__Roseburia
## g__Anaerostipes
## g__Escherichia_Shigella
## g__Enterococcus
## g__Prevotella_9
## g__Eggerthella
##
## The number of the additional bacteria compared to Reference Matrix is [1]
## ###########################################################
##
## ##################Status of the BRS sample##################
## Whether the BRS has the all bateria of Reference Matrix: TRUE
## Correlation Coefficient of the BRS is: 0.9714
## Bray Curtis of the BRS is: 0.07607
## Impurity of the BRS is: 0.06409
## ###########################################################
## #####Final Evaluation Results of the BRS #######
## The BRS of sequencing dataset passed the cutoff of the Reference Matrix
## Cutoff of Coefficient is 0.8946
## Cutoff of BrayCurtis is 0.3878
## Cutoff of Impurity is 0.1565
## ###########################################################
## 8002 8003 8004 8006 8007 8008 8009 8005 mean
## Bifidobacterium 31.11079015 30.88310969 32.31232692 18.4930259 20.20409870 17.96225391 18.03588291 27.22437034 24.52823232
## Bacteroides 20.44753484 14.46581958 24.57151411 26.7370147 25.85863655 27.51353663 26.99272343 24.23896093 23.85321759
## Faecalibacterium 0.79850615 0.62937893 1.05531023 1.7487249 1.64282727 1.96346413 1.81219797 1.04035376 1.33634542
## Lactobacillus 2.61732573 3.36856272 3.44379163 5.9292703 5.78000836 5.78189064 6.32672332 3.87088505 4.63980722
## Parabacteroides 7.11124408 7.45952579 5.36075144 8.7149995 8.01840234 8.74899584 8.62634005 5.61833757 7.45732457
## Collinsella 0.12792605 0.88271385 0.55665744 1.2764130 0.67921372 1.89356397 1.26367828 0.45502126 0.89189845
## Coprococcus_3 1.00380683 0.97969362 0.80270938 1.6693557 1.56419908 1.67969035 1.71221463 0.87586251 1.28594151
## Dorea 2.80715148 3.45564660 2.46613277 3.9684605 3.99163530 3.81529666 3.69382881 2.34880690 3.31836988
## Streptococcus 2.91960260 3.43387563 2.59149764 3.4818362 3.31409452 3.51378702 3.37721491 2.68740253 3.16491388
## Roseburia 0.03404484 0.04750030 0.02806676 0.0338295 0.03178586 0.02503886 0.02499583 0.03311188 0.03229673
## Anaerostipes 0.32291011 0.43245062 0.31528329 0.5386697 0.44500209 0.53416240 0.48047548 0.32471000 0.42420796
## Escherichia_Shigella 15.27581475 16.00265210 12.36527954 10.8423545 13.43203680 10.15743185 11.59945565 14.03516268 12.96377349
## Enterococcus 14.51444842 14.66472707 11.04239952 13.1674820 12.07360937 12.88771113 12.84924735 11.61906390 12.85233610
## Prevotella_9 0.77374627 3.07465463 2.75709154 2.9145415 2.52446675 3.04952478 2.84952508 5.40257632 2.91826586
## Eggerthella 0.04951976 0.15437596 0.27131203 0.4840221 0.43998327 0.47365181 0.35549631 0.16128688 0.29870601
## Impurity_level 0.08562792 0.06531291 0.05987576 0.0000000 0.00000000 0.00000000 0.00000000 0.06409000 0.03436332
14.3.2.3 Spike-in sample’s remove
<- get_GroupPhyloseq(
dada2_ps_remove_BRS ps = dada2_ps,
group = "Group",
group_names = "QC",
discard = TRUE)
dada2_ps_remove_BRS
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 896 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 896 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 896 tips and 893 internal nodes ]
## refseq() DNAStringSet: [ 896 reference sequences ]
14.3.2.4 Rarefaction curves
plot_RarefCurve(ps = dada2_ps_remove_BRS,
taxa_level = "OTU",
step = 400,
label = "Group",
color = "Group")
## rarefying sample S6030
## rarefying sample S6032
## rarefying sample S6033
## rarefying sample S6035
## rarefying sample S6036
## rarefying sample S6037
## rarefying sample S6040
## rarefying sample S6043
## rarefying sample S6045
## rarefying sample S6046
## rarefying sample S6048
## rarefying sample S6049
## rarefying sample S6050
## rarefying sample S6054
## rarefying sample S6055
## rarefying sample S6058
## rarefying sample S6059
## rarefying sample S6060
## rarefying sample S6061
## rarefying sample S6063
## rarefying sample S6065
## rarefying sample S6066
## rarefying sample S6068

Figure 14.4: Rarefaction curves (16s example)
The result showed that all the samples had different sequence depth but had the full sample richness.
14.3.3 Data processing
This part has too may procedures and we only choose some of them. Please go to Chapter 6 to see more approaches and details for being familiar with this part.
14.3.3.1 Rarefy otu counts
From previous results of quality evaluation, the sequence depth of samples are different which have effects on the downstream analysis. Here, choosing the rarefy (Normaliztion method: random subsampling counts to the smallest library size) to get the equal sample sums.
<- norm_rarefy(object = dada2_ps_remove_BRS,
dada2_ps_rarefy size = 51181)
dada2_ps_rarefy
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 891 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 891 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 891 tips and 888 internal nodes ]
## refseq() DNAStringSet: [ 891 reference sequences ]
In addition, we could also perform some other normalization methods on the rarefied phyloseq object. By the way, we didn’t transform the data by using log algorithm because the count matrix is required by the following data analysis methods.
14.3.3.2 Extracting specific taxonomic level
<- summarize_taxa(ps = dada2_ps_rarefy,
dada2_ps_rare_genus taxa_level = "Genus")
dada2_ps_rare_genus
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 198 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 198 taxa by 6 taxonomic ranks ]
14.3.3.3 Filtering the low relative abundance or unclassified taxa by the threshold (total counts < 10)
<- run_filter(ps = dada2_ps_rare_genus,
dada2_ps_rare_genus_filter cutoff = 10,
unclass = TRUE)
dada2_ps_rare_genus_filter
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 149 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 149 taxa by 6 taxonomic ranks ]
14.3.3.4 Trimming the taxa with low occurrence less than threshold
<- run_trim(object = dada2_ps_rare_genus_filter,
dada2_ps_rare_genus_filter_trim cutoff = 0.2,
trim = "feature")
dada2_ps_rare_genus_filter_trim
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 100 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 100 taxa by 6 taxonomic ranks ]
Finally, we obtained the final phyloseq-class object dada2_ps_rare_genus_filter_trim and changed its name.
14.3.4 Diversity analysis
14.3.4.1 Alpha diveristy
- Calculate the alpha diversity
Notes: the otu table must be counts matrix (rarefied but not trim counts matrix) when you choose Observed etc measures.
<- run_alpha_diversity(ps = dada2_ps_rare_genus,
dada_ps_rare_genus_alpha measures = c("Shannon", "Chao1", "Observed"))
print(dada_ps_rare_genus_alpha)
## TempRowNames Group Observed Chao1 se.chao1 Shannon
## 1 S6030 BB 74 74.00000 0.0000000 2.622800
## 2 S6032 BB 38 38.00000 0.0000000 1.711068
## 3 S6033 BB 80 80.16667 0.5431879 2.719495
## 4 S6035 AA 35 35.00000 0.1232013 1.953807
## 5 S6036 BB 95 95.00000 0.0000000 2.942657
## 6 S6037 AA 35 35.00000 0.0000000 1.933007
## 7 S6040 BB 58 58.00000 0.0000000 2.490198
## 8 S6043 AA 65 65.33333 0.9246628 2.098003
## 9 S6045 BB 60 60.00000 0.0000000 2.920363
## 10 S6046 BB 71 71.00000 0.1241166 2.264433
## 11 S6048 BB 66 66.00000 0.0000000 2.630421
## 12 S6049 AA 60 60.00000 0.0000000 2.506885
## 13 S6050 BB 53 53.00000 0.0000000 2.431334
## 14 S6054 BB 84 84.00000 0.0000000 2.769973
## 15 S6055 BB 75 75.00000 0.0000000 2.297388
## 16 S6058 BB 37 37.00000 0.2465985 2.262059
## 17 S6059 AA 71 71.00000 0.0000000 2.682960
## 18 S6060 AA 48 48.00000 0.0000000 1.883809
## 19 S6061 AA 80 80.00000 0.0000000 3.088651
## 20 S6063 BB 38 38.00000 0.0000000 1.543267
## 21 S6065 AA 77 77.00000 0.0000000 2.877568
## 22 S6066 AA 67 67.00000 0.0000000 2.638564
## 23 S6068 BB 40 40.00000 0.0000000 2.100363
- visualization
plot_boxplot(data = dada_ps_rare_genus_alpha,
y_index = c("Shannon", "Chao1", "Observed"),
group = "Group",
group_names = c("AA", "BB"),
group_color = c("red", "blue"),
method = "wilcox.test")

Figure 14.5: Alpha diversity (16s example)
14.3.4.2 Beta diversity
- beta dipersion
<- run_beta_diversity(ps = dada2_ps_rare_genus_filter_trim,
dada2_ps_beta method = "bray",
group = "Group")
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.01412 0.0141198 1.6675 999 0.201
## Residuals 21 0.17783 0.0084679
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## AA BB
## AA 0.21
## BB 0.21063
$BetaDispersion dada2_ps_beta

Figure 14.6: Beta diversity (16s example)
14.3.5 PERMANOVA + Ordination
14.3.5.1 PERMANOVA
<- run_permanova(ps = dada2_ps_rare_genus_filter_trim,
dada2_ps_per method = "bray",
columns = "Group")
print(dada2_ps_per)
## SumsOfSample Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) AdjustedPvalue
## Group 23 1 0.2290643 0.2290643 1.331995 0.05964515 0.225 0.225
The PERMANOVA result of the Group (Pr(>F)
> 0.05) revealed that the two groups had not the distinct patterns of microbial community.
14.3.5.2 Ordination
We performed ordination by using Principal Coordinate Analysis (PCoA). If you want to try other methods please go to see Chapter 8 for more details.
<- run_ordination(
dada2_ps_ordination ps = dada2_ps_rare_genus_filter_trim,
group = "Group",
method = "PCoA")
plot_Ordination(ResultList = dada2_ps_ordination,
group = "Group",
group_names = c("AA", "BB"),
group_color = c("blue", "red"))

Figure 14.7: PCoA (16s example)
14.3.6 Microbial composition
A whole picture of the microbial composition.
14.3.6.1 Stacked barplot
- XMAS package
plot_StackBarPlot(
ps = dada2_ps_rarefy,
taxa_level = "Phylum",
group = "Group",
cluster = TRUE)
## [1] "This palatte have 20 colors!"

Figure 14.8: Microbial composition (16s example)
- XVIZ package
plot_stacked_bar_XIVZ(
phyloseq = dada2_ps_rarefy,
level = "Phylum",
feature = "Group")

Figure 14.9: Microbial composition (16s example) XVIZ
14.3.6.2 Core microbiota
- convert absolute abundance into relative abundance
<- XMAS2::normalize(object = dada2_ps_rare_genus_filter_trim,
dada2_ps_rare_genus_filter_trim_rb method = "TSS")
dada2_ps_rare_genus_filter_trim_rb
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 100 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 1 sample variables ]
## tax_table() Taxonomy Table: [ 100 taxa by 6 taxonomic ranks ]
- visualization
<- seq(0.05, 1, 0.05)
prevalences <- 10^seq(log10(1e-3), log10(0.2), length = 10)
detections
<- plot_core_taxa(dada2_ps_rare_genus_filter_trim_rb,
pl_core plot.type = "heatmap",
colours = gray(seq(0, 1, length=5)),
prevalences = prevalences,
detections = detections,
min.prevalence = 0.5)+
xlab("Detection Threshold (Relative Abundance (%))")
pl_core

Figure 14.10: Core taxa (16s example)
The degree of color indicates the size of abundance and prevalence.
- Use
core_members
to obtain the core taxa. detection for abundance and prevalence for occurrence.
<- core_members(dada2_ps_rare_genus_filter_trim_rb, detection = 0.01, prevalence = 0.8)
core_taxa_name print(core_taxa_name)
## [1] "g__Bifidobacterium" "g__Blautia"
Result:
Only 2 genera (g__Bifidobacterium and g__Blautia) passed the threshold of detection and prevalence which we choose.
14.3.7 Differential Analysis
There are more than 10 approaches to perform differential analysis. Here, we choose two of them and recommend users going to Chapter 10 to see more detials.
14.3.7.1 Liner discriminant analysis (LDA) effect size (LEfSe)
- Calculation
<- run_lefse(
dada2_ps_lefse ps = dada2_ps_rare_genus_filter_trim,
group = "Group",
group_names = c("AA", "BB"),
norm = "CPM",
Lda = 2)
head(dada2_ps_lefse)
## TaxaID Block Enrichment LDA_Score EffectSize Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Clostridium_sensu_stricto_1 9_AA vs 14_BB BB 3.677589 2.679298 NA
## 2 g__Intestinibacter 9_AA vs 14_BB BB 3.177811 2.384308 NA
## 3 g__Lactobacillus 9_AA vs 14_BB BB 4.093463 2.450656 -3.336128
## 4 g__Odoribacter 9_AA vs 14_BB BB 2.058258 1.876320 NA
## 5 g__Parasutterella 9_AA vs 14_BB AA -3.662466 2.234915 4.402050
## 6 g__Romboutsia 9_AA vs 14_BB BB 3.699671 2.929140 -6.856424
## Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1 225.11921 0.00000 1173.88222 -4.746617 5861.9051
## 2 383.50911 0.00000 1398.79740 -1.964554 1883.3386
## 3 1283.85401 520.46061 5256.08763 -5.173067 19938.7655
## 4 21.93175 604.99989 0.00000 2.099583 578.0758
## 5 172.77125 1080.35695 51.09966 4.481148 3901.0977
## 6 2189.79596 50.35627 5835.02623 -3.334490 7094.1306
## Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1 350.3379 9405.0554 60.87 22.22 85.71
## 2 680.6440 2656.4994 60.87 22.22 85.71
## 3 892.0297 32183.0957 86.96 77.78 92.86
## 4 1083.9017 252.9020 52.17 77.78 35.71
## 5 9320.3061 417.3209 65.22 88.89 50.00
## 6 1086.1419 10956.4090 78.26 55.56 92.86
## Odds Ratio (95% CI)
## 1 4900 (5000;4900)
## 2 3.4 (5.8;1)
## 3 1.1e+08 (1.1e+08;1.1e+08)
## 4 0.31 (-2;2.6)
## 5 0.0025 (-12;12)
## 6 76 (85;68)
- Visualization
# # don't run this code when you do lefse in reality
# dada2_ps_lefse$LDA_Score <- dada2_ps_lefse$LDA_Score * 1000
plot_lefse(
da_res = dada2_ps_lefse,
x_index = "LDA_Score",
x_index_cutoff = 1,
group_color = c("green", "red"))

Figure 14.11: Lefse analysis (16s example)
14.3.7.2 Wilcoxon Rank-Sum test
- Calculation
<- run_wilcox(
dada2_ps_wilcox ps = dada2_ps_rare_genus_filter_trim,
group = "Group",
group_names = c("AA", "BB"))
head(dada2_ps_wilcox)
## TaxaID Block Enrichment EffectSize Statistic Pvalue AdjustedPvalue Log2FoldChange (Median)\nAA_vs_BB
## 1 g__Acidaminococcus 9_AA vs 14_BB Nonsignif 70.56349 63.5 1.0000000 1.0000000 NA
## 2 g__Actinomyces 9_AA vs 14_BB Nonsignif 39.65079 44.5 0.2556616 0.7014746 -0.9577718
## 3 g__Adlercreutzia 9_AA vs 14_BB Nonsignif 23.17460 44.0 0.1980169 0.7014746 NA
## 4 g__Agathobacter 9_AA vs 14_BB Nonsignif 1222.58730 71.0 0.6293971 0.8964389 0.1524904
## 5 g__Akkermansia 9_AA vs 14_BB Nonsignif 104.78571 64.5 0.9304707 0.9692403 NA
## 6 g__Alistipes 9_AA vs 14_BB Nonsignif 23.03175 67.0 0.8239942 0.9363570 -0.1085245
## Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Rank)\nAA_vs_BB Mean Rank Abundance\nAA
## 1 0 0 0.0 0.01201252 12.06
## 2 30 26 50.5 -0.42227633 9.94
## 3 0 0 9.0 -0.43387758 9.89
## 4 316 329 296.0 0.17342686 12.89
## 5 0 0 0.0 0.03358045 12.17
## 6 64 64 69.0 0.08724541 12.44
## Mean Rank Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1 11.96 21.74 22.22 21.43 0.67 (-0.11;1.5)
## 2 13.32 82.61 88.89 78.57 4.1 (6.8;1.3)
## 3 13.36 43.48 33.33 50.00 8.6 (13;4.3)
## 4 11.43 65.22 55.56 71.43 0.39 (-1.4;2.2)
## 5 11.89 21.74 22.22 21.43 1.5 (2.3;0.71)
## 6 11.71 73.91 77.78 71.43 0.88 (0.64;1.1)
- Volcano
plot_volcano(
da_res = dada2_ps_wilcox,
group_names = c("AA", "BB"),
x_index = "Log2FoldChange (Rank)\nAA_vs_BB",
x_index_cutoff = 0.5,
y_index = "Pvalue",
y_index_cutoff = 0.05,
group_color = c("red", "grey", "blue"),
topN = 5)

Figure 14.12: Wilcoxon Rank-Sum test (16s example)
14.4 Metagenomics (MGS)
The metagenomic data analysis pipeline is just the same as 16s. In briefly, the example comprises the following steps:
Converting inputs into phyloseq object;
Quality Evaluation;
Pre-Processing Data;
Diversity analysis;
Ordination analysis;
Composition analysis;
14.4.1 Converting inputs into phyloseq-class object
The result of the in-house Metaphlan2/3 pipeline:
/home/xuxiaomin/project/standardized_analytics_workflow_R_function/demo_data/MGS/metaphlan2_merged.tsv
/home/xuxiaomin/project/standardized_analytics_workflow_R_function/demo_data/MGS/metadata.txt
<- read.table(
metaphlan2_res system.file(
"extdata", "metaphlan2_merged.tsv",
package = "XMAS2"
),header = TRUE,
stringsAsFactors = FALSE
)<- read.table(
metaphlan2_sam system.file(
"extdata", "metaphlan2_metadata.tsv",
package = "XMAS2"
),sep = "\t",
header = TRUE,
stringsAsFactors = FALSE
)
<- import_metaphlan_taxa(data_metaphlan2 = metaphlan2_res,
metaphlan2_res_list taxa_level = "Species")
<- metaphlan2_res_list$abu_tab
otu_tab <- metaphlan2_res_list$tax_tab
tax_tab <- metaphlan2_sam %>% tibble::column_to_rownames("SampleID")
sam_tab
<- get_metaphlan_phyloseq(
metaphlan2_ps otu_tab = otu_tab,
sam_tab = sam_tab,
tax_tab = tax_tab)
metaphlan2_ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 326 taxa and 23 samples ]
## sample_data() Sample Data: [ 23 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 326 taxa by 7 taxonomic ranks ]
Here, the phyloseq object comprises three components (OTU Table, Sample Data and Taxonomy Table).
14.4.1.1 Summarize phyloseq-class object
summarize_phyloseq(ps = metaphlan2_ps)
## Compositional = NO2
## 1] Min. number of reads = 0.98218212] Max. number of reads = 1.00000053] Total number of reads = 22.94511414] Average number of reads = 0.9976136565217395] Median number of reads = 0.99980247] Sparsity = 0.7131234995998936] Any OTU sum to 1 or less? YES8] Number of singletons = 3239] Percent of OTUs that are singletons
## (i.e. exactly one read detected across all samples)010] Number of sample variables are: 2Groupphynotype2
## [[1]]
## [1] "1] Min. number of reads = 0.9821821"
##
## [[2]]
## [1] "2] Max. number of reads = 1.0000005"
##
## [[3]]
## [1] "3] Total number of reads = 22.9451141"
##
## [[4]]
## [1] "4] Average number of reads = 0.997613656521739"
##
## [[5]]
## [1] "5] Median number of reads = 0.9998024"
##
## [[6]]
## [1] "7] Sparsity = 0.713123499599893"
##
## [[7]]
## [1] "6] Any OTU sum to 1 or less? YES"
##
## [[8]]
## [1] "8] Number of singletons = 323"
##
## [[9]]
## [1] "9] Percent of OTUs that are singletons\n (i.e. exactly one read detected across all samples)0"
##
## [[10]]
## [1] "10] Number of sample variables are: 2"
##
## [[11]]
## [1] "Group" "phynotype"
Notice the Sparsity (0.713), indicating the data has many zeros and pay attention to the downstream data analysis.
14.4.2 Quality Control
14.4.2.1 Spike-in sample (BRS) assessment
- Extract the species level phyloseq and obtain the BRS_ID
<- summarize_taxa(ps = metaphlan2_ps,
metaphlan2_ps_species taxa_level = "Species")
@sam_data metaphlan2_ps_species
## Group phynotype
## s1 BB 0.00
## s2 AA 2.50
## s3 BB 0.00
## s4 AA 1.25
## s5 AA 30.00
## s6 AA 15.00
## s7 BB 8.75
## s8 BB 0.00
## s9 BB 3.75
## s10 BB 2.50
## s11 BB 15.00
## s12 BB 2.50
## s13 BB 2.50
## s14 BB 0.00
## s15 BB 1.07
## s16 BB 2.50
## s17 AA 5.00
## s18 BB 35.00
## s19 BB 7.50
## s20 BB 15.00
## s21 AA 3.75
## s22 AA 3.75
## refE QC NA
- Run
run_RefCheck
run_RefCheck(
ps = metaphlan2_ps_species,
BRS_ID = "refE", Ref_type = "MGS")
## Noting: the Reference Matrix is for MGS
##
## ############Matched baterica of the BRS sample#############
## The number of BRS' bacteria matched the Reference Matrix is [16]
## s__Bifidobacterium_longum
## s__Bacteroides_uniformis
## s__Faecalibacterium_prausnitzii
## s__Bifidobacterium_adolescentis
## s__Bacteroides_thetaiotaomicron
## s__Collinsella_aerofaciens
## s__Coprococcus_comes
## s__Dorea_formicigenerans
## s__Streptococcus_salivarius
## s__Bacteroides_xylanisolvens
## s__Bacteroides_ovatus
## s__Roseburia_hominis
## s__Bacteroides_vulgatus
## s__Prevotella_copri
## s__Bifidobacterium_pseudocatenulatum
## s__Lachnospiraceae_bacterium_5_1_63FAA
## The number of bacteria unmatched the Reference Matrix is [9]
## s__Lactobacillus_salivarius
## s__Bacteroides_fragilis
## s__Parabacteroides_goldsteinii
## s__Escherichia_coli
## s__Enterococcus_faecalis
## s__Bifidobacterium_bifidum
## s__Lactobacillus_pentosus
## s__Bacteroides_intestinalis
## s__Eggerthella_unclassified
## The number of the additional bacteria compared to the Reference Matrix is [56]
## ###########################################################
##
## ##################Status of the BRS sample##################
## Whether the BRS has the all bateria of Reference Matrix: FALSE
## Correlation Coefficient of the BRS is: -0.06471
## Bray Curtis of the BRS is: 0.8752
## Impurity of the BRS is: 32.69
## ###########################################################
## #####Final Evaluation Results of the BRS #######
## The BRS of sequencing dataset didn't pass the cutoff of the Reference Matrix
## ###########################################################
## 7682 7683 7684 7685 7842 7843 7844 7845 refE mean
## Bifidobacterium_longum 10.31563 9.25812 9.69184 7.76031 11.03311 11.61484 12.29030 11.69019 0.01646 9.29675556
## Bacteroides_uniformis 2.24195 2.24035 1.92015 2.18435 2.43230 2.38180 2.13830 2.41437 0.21061 2.01824222
## Faecalibacterium_prausnitzii 0.65615 0.60153 0.60112 0.62079 0.54147 0.55383 0.58806 0.54655 1.61939 0.70321000
## Bifidobacterium_adolescentis 7.05426 6.28460 6.57297 6.25448 4.69357 4.80628 4.94943 4.84278 0.04649 5.05609556
## Bacteroides_thetaiotaomicron 3.25076 3.31897 3.22418 3.43809 3.35611 3.38323 3.29098 3.30355 1.47422 3.11556556
## Collinsella_aerofaciens 0.56220 0.53249 0.60476 0.47934 0.65513 0.68833 0.76063 0.66896 0.07251 0.55826111
## Coprococcus_comes 2.28581 2.42978 2.25227 2.82904 1.21527 1.16485 1.07160 1.07834 0.01444 1.59348889
## Dorea_formicigenerans 4.83509 5.02149 5.18268 5.56891 3.34720 3.09877 3.08677 2.70732 0.02775 3.65288667
## Streptococcus_salivarius 3.54266 3.74119 3.62036 4.01546 2.90216 2.70193 2.61760 2.57348 0.02846 2.86036667
## Bacteroides_xylanisolvens 1.55648 1.84824 1.91166 1.85273 1.75220 1.74002 1.69811 1.67676 0.32466 1.59565111
## Bacteroides_ovatus 3.08489 3.27226 3.10904 3.24565 3.51376 3.50063 3.37872 3.42300 0.25782 2.97619667
## Roseburia_hominis 0.04383 0.04183 0.04107 0.02304 0.03853 0.03464 0.03532 0.03597 0.01307 0.03414444
## Bacteroides_vulgatus 3.06713 3.20369 3.14979 3.15352 3.24822 3.09280 3.13038 3.06113 2.14684 3.02816667
## Prevotella_copri 2.03128 1.99619 1.92504 2.15422 1.57638 1.60584 1.57913 1.59224 60.84109 8.36682333
## Bifidobacterium_pseudocatenulatum 6.89310 6.36177 7.39605 5.76804 6.19464 6.47094 7.41615 6.27837 0.15023 5.88103222
## Lachnospiraceae_bacterium_5_1_63FAA 0.22002 0.26062 0.28058 0.33329 0.07093 0.06778 0.05229 0.04781 0.06825 0.15573000
## Impurity_level 7.33876 7.34799 7.14891 6.47250 8.88190 9.72198 9.72162 10.57600 32.69000 11.09996222
## Evaluation
## Bifidobacterium_longum refE didn't pass the threshold (2022-10-08 15:36:53).
## Bacteroides_uniformis refE didn't pass the threshold (2022-10-08 15:36:53).
## Faecalibacterium_prausnitzii refE didn't pass the threshold (2022-10-08 15:36:53).
## Bifidobacterium_adolescentis refE didn't pass the threshold (2022-10-08 15:36:53).
## Bacteroides_thetaiotaomicron refE didn't pass the threshold (2022-10-08 15:36:53).
## Collinsella_aerofaciens refE didn't pass the threshold (2022-10-08 15:36:53).
## Coprococcus_comes refE didn't pass the threshold (2022-10-08 15:36:53).
## Dorea_formicigenerans refE didn't pass the threshold (2022-10-08 15:36:53).
## Streptococcus_salivarius refE didn't pass the threshold (2022-10-08 15:36:53).
## Bacteroides_xylanisolvens refE didn't pass the threshold (2022-10-08 15:36:53).
## Bacteroides_ovatus refE didn't pass the threshold (2022-10-08 15:36:53).
## Roseburia_hominis refE didn't pass the threshold (2022-10-08 15:36:53).
## Bacteroides_vulgatus refE didn't pass the threshold (2022-10-08 15:36:53).
## Prevotella_copri refE didn't pass the threshold (2022-10-08 15:36:53).
## Bifidobacterium_pseudocatenulatum refE didn't pass the threshold (2022-10-08 15:36:53).
## Lachnospiraceae_bacterium_5_1_63FAA refE didn't pass the threshold (2022-10-08 15:36:53).
## Impurity_level refE didn't pass the threshold (2022-10-08 15:36:53).
The spike-in samples didn’t pass the cutoff and failed to add the the Reference Matrix.
14.4.2.2 Spike-in sample’s remove
<- get_GroupPhyloseq(
metaphlan2_ps_remove_BRS ps = metaphlan2_ps,
group = "Group",
group_names = "QC",
discard = TRUE)
metaphlan2_ps_remove_BRS
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 326 taxa and 22 samples ]
## sample_data() Sample Data: [ 22 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 326 taxa by 7 taxonomic ranks ]
14.4.3 Data processing
This part has too may procedures and we only choose some of them. Please go to Chapter 6 to see more approaches and details for being familiar with this part.
14.4.3.1 Extracting specific taxonomic level
<- summarize_taxa(ps = metaphlan2_ps_remove_BRS,
metaphlan2_ps_species taxa_level = "Species")
metaphlan2_ps_species
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 325 taxa and 22 samples ]
## sample_data() Sample Data: [ 22 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 325 taxa by 7 taxonomic ranks ]
14.4.3.2 Filtering the low relative abundance or unclassified taxa by the threshold (total counts < 1e-4)
The condition to filter low relative abundance is according to this article (Thingholm et al. 2019).
Species from taxonomic profiles were retained for further analysis if their mean relative abundance exceeded 0.005 (0.5%) across the dataset with a minimum abundance of 0.05 (5%) in at least one sample and non-zero abundance in at least 60% of samples.
There are three conditions
Mean relative abundance: 0.005;
Minimum relative abundance: 0.05;
Occurrence: 60%.
Here, we use 0.01 (the 1e-4 regarded as 0.01 compared to the Referece because Metaphlan2 data had been divided 100)
<- run_filter(ps = metaphlan2_ps_species,
metaphlan2_ps_species_filter cutoff = 1e-4,
unclass = TRUE)
metaphlan2_ps_species_filter
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 233 taxa and 22 samples ]
## sample_data() Sample Data: [ 22 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 233 taxa by 7 taxonomic ranks ]
14.4.3.3 Trimming the taxa with low occurrence less than threshold
<- run_trim(object = metaphlan2_ps_species_filter,
metaphlan2_ps_species_filter_trim cutoff = 0.1,
trim = "feature")
metaphlan2_ps_species_filter_trim
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 179 taxa and 22 samples ]
## sample_data() Sample Data: [ 22 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 179 taxa by 7 taxonomic ranks ]
Finally, we obtained the final phyloseq-class object metaphlan2_ps_species_filter_trim and changed its name.
14.4.4 Diversity analysis
14.4.4.1 Alpha diveristy
- Calculate the alpha diversity
Notes: choosing the measures (Shannon, Simpson and InvSimpson) only for relative abundance.
<- run_alpha_diversity(ps = metaphlan2_ps_remove_BRS,
metaphlan2_ps_genus_alpha measures = c("Shannon", "Simpson", "InvSimpson"))
head(metaphlan2_ps_genus_alpha)
## TempRowNames Group phynotype Shannon Simpson InvSimpson
## 1 s1 BB 0.00 2.876002 0.9108857 11.221549
## 2 s2 AA 2.50 2.045392 0.8105742 5.279111
## 3 s3 BB 0.00 3.441176 0.9439371 17.837114
## 4 s4 AA 1.25 2.746917 0.8290768 5.850579
## 5 s5 AA 30.00 1.450722 0.6412178 2.787207
## 6 s6 AA 15.00 2.619951 0.8950369 9.527154
- visualization
plot_boxplot(data = metaphlan2_ps_genus_alpha,
y_index = c("Shannon", "Simpson", "InvSimpson"),
group = "Group",
group_names = c("AA", "BB"),
group_color = c("red", "blue"),
method = "wilcox.test")

Figure 14.13: Alpha diversity (MGS example)
14.4.4.2 Beta diversity
- beta dipersion
<- run_beta_diversity(ps = metaphlan2_ps_species_filter_trim,
metaphlan2_ps_beta method = "bray",
group = "Group")
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.005966 0.0059663 0.7914 999 0.372
## Residuals 20 0.150783 0.0075392
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## AA BB
## AA 0.365
## BB 0.38426
$BetaDispersion metaphlan2_ps_beta

Figure 14.14: Beta diversity (MGS example)
14.4.5 PERMANOVA + Ordination
14.4.5.1 PERMANOVA
<- run_permanova(ps = metaphlan2_ps_species_filter_trim,
metaphlan2_ps_per method = "bray",
columns = "Group")
head(metaphlan2_ps_per)
## SumsOfSample Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) AdjustedPvalue
## Group 22 1 0.8452127 0.8452127 2.728018 0.1200289 0.002 0.002
The PERMANOVA result of the Group (AdjustedPvalue < 0.05) revealed that the two groups had the distinct patterns of microbial community.
14.4.5.2 Ordination
We performed ordination by using Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP). If you want to try other methods please go to see Chapter 8 for more details.
<- run_ordination(
metaphlan2_ps_ordination ps = metaphlan2_ps_species_filter_trim,
group = "Group",
method = "UMAP")
plot_Ordination(ResultList = metaphlan2_ps_ordination,
group = "Group",
group_names = c("AA", "BB"),
group_color = c("blue", "red"),
sample = TRUE,
sidelinechart = FALSE,
circle_type = "ellipse_line",
sideboxplot = TRUE)

Figure 14.15: PCoA (MGS example)
14.4.6 Microbial composition
14.4.6.1 Stacked barplot
- XVIZ package
plot_stacked_bar_XIVZ(
phyloseq = metaphlan2_ps_species,
level = "Phylum",
feature = "Group")

Figure 14.16: Microbial composition (MGS) XVIZ
- XMAS package
plot_StackBarPlot(
ps = metaphlan2_ps_species,
taxa_level = "Phylum",
group = "Group",
cluster = TRUE)
## [1] "This palatte have 20 colors!"

Figure 14.17: Microbial composition (MGS example)
14.4.6.2 Core microbiota
- visualization
library(RColorBrewer)
<- seq(0.05, 1, 0.05)
prevalences <- 10^seq(log10(1e-3), log10(.2), length = 10)
detections
plot_core_taxa(metaphlan2_ps_species_filter_trim,
plot.type = "heatmap",
colours = rev(brewer.pal(5, "Spectral")),
prevalences = prevalences,
detections = detections,
min.prevalence = 0.5)+
xlab("Detection Threshold (Relative Abundance (%))") +
theme(axis.text.y = element_text(face="italic"))

Figure 14.18: Core taxa (MGS example)
The degree of color indicates the size of abundance and prevalence.
- Use
core_members
to obtain the core taxa. detection for abundance and prevalence for occurrence.
<- core_members(metaphlan2_ps_species_filter_trim,
core_taxa_name detection = 0.001,
prevalence = 0.5)
print(core_taxa_name)
## [1] "s__Bacteroides_ovatus" "s__Bacteroides_thetaiotaomicron" "s__Bacteroides_uniformis"
## [4] "s__Bacteroides_vulgatus" "s__Bifidobacterium_longum" "s__Bifidobacterium_pseudocatenulatum"
## [7] "s__Collinsella_aerofaciens" "s__Escherichia_coli" "s__Eubacterium_eligens"
## [10] "s__Eubacterium_hallii" "s__Faecalibacterium_prausnitzii" "s__Roseburia_inulinivorans"
## [13] "s__Ruminococcus_gnavus" "s__Ruminococcus_obeum" "s__Ruminococcus_sp_5_1_39BFAA"
## [16] "s__Ruminococcus_torques" "s__Streptococcus_salivarius"
Result:
17 species passed the threshold of detection and prevalence which we choose.
14.4.7 Differential Analysis
There are more than 10 approaches to perform differential analysis. Here, we choose two of them and recommend users going to Chapter 10 to see more detials.
14.4.7.1 Liner discriminant analysis (LDA) effect size (LEfSe)
- Calculation
<- run_lefse(
metaphlan2_ps_lefse ps = metaphlan2_ps_species_filter_trim,
group = "Group",
group_names = c("AA", "BB"),
norm = "CPM",
Lda = 2)
head(metaphlan2_ps_lefse)
## TaxaID Block Enrichment LDA_Score EffectSize Log2FoldChange (Median)\nAA_vs_BB
## 1 s__Actinomyces_odontolyticus 7_AA vs 15_BB BB 2.354203 0.09521043 NA
## 2 s__Bacteroides_thetaiotaomicron 7_AA vs 15_BB AA -4.429029 4.25590505 5.422189
## 3 s__Bifidobacterium_adolescentis 7_AA vs 15_BB BB 4.443556 4.18357937 NA
## 4 s__Bifidobacterium_longum 7_AA vs 15_BB BB 4.446101 2.06541430 -4.651371
## 5 s__Clostridium_bartlettii 7_AA vs 15_BB BB 2.927763 1.73866565 NA
## 6 s__Collinsella_aerofaciens 7_AA vs 15_BB BB 3.839013 3.22865691 NA
## Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB Log2FoldChange (Mean)\nAA_vs_BB Mean Abundance\n(All)
## 1 0.0000 0.000 5.595339 NA 16.83488
## 2 2878.5476 46631.686 1087.526189 3.198416 24113.15854
## 3 0.0000 0.000 3843.156200 NA 36041.41284
## 4 11049.6932 1203.167 30236.246304 -3.717363 41205.25130
## 5 287.0742 0.000 459.271930 -3.712705 1293.74022
## 6 7678.6818 0.000 13208.610915 -3.239758 12628.30960
## Mean Abundance\nAA Mean Abundance\nBB Occurrence (100%)\n(All) Occurrence (100%)\nAA Occurrence (100%)\nBB
## 1 0.0000 24.69116 40.91 0.00 60.00
## 2 61441.3764 6693.32352 90.91 100.00 86.67
## 3 0.0000 52860.73882 40.91 0.00 60.00
## 4 4437.1597 58363.69406 86.36 85.71 86.67
## 5 139.7505 1832.26875 68.18 42.86 80.00
## 6 1868.4023 17649.59965 63.64 42.86 73.33
## Odds Ratio (95% CI)
## 1 <NA>
## 2 0.064 (-5.3;5.5)
## 3 <NA>
## 4 270 (280;260)
## 5 450 (470;440)
## 6 28 (35;22)
- Visualization
# # don't run this code when you do lefse in reality
# metaphlan2_ps_lefse$LDA_Score <- metaphlan2_ps_lefse$LDA_Score * 100
plot_lefse(
da_res = metaphlan2_ps_lefse,
x_index = "LDA_Score",
x_index_cutoff = 2,
group_color = c("green", "red"))

Figure 14.19: Lefse analysis (MGS example)
14.4.7.2 Wilcoxon Rank-Sum test
- Calculation
<- run_wilcox(
metaphlan2_ps_wilcox ps = metaphlan2_ps_species_filter_trim,
group = "Group",
group_names = c("AA", "BB"))
head(metaphlan2_ps_wilcox)
## TaxaID Block Enrichment EffectSize Statistic Pvalue AdjustedPvalue
## 1 s__Acidaminococcus_fermentans 7_AA vs 15_BB Nonsignif 0.09951572 49.0 0.75333110 0.9364324
## 2 s__Acidaminococcus_intestini 7_AA vs 15_BB Nonsignif 0.75222128 51.0 0.91657416 1.0000000
## 3 s__Actinomyces_odontolyticus 7_AA vs 15_BB Nonsignif 4.45220244 21.0 0.01422085 0.2314119
## 4 s__Adlercreutzia_equolifaciens 7_AA vs 15_BB Nonsignif 0.60623026 28.0 0.05769072 0.4414970
## 5 s__Akkermansia_muciniphila 7_AA vs 15_BB Nonsignif 1.45465974 63.0 0.23762783 0.5826765
## 6 s__Alistipes_finegoldii 7_AA vs 15_BB Nonsignif 0.66890792 73.5 0.11454477 0.5119018
## Log2FoldChange (Median)\nAA_vs_BB Median Abundance\n(All) Median Abundance\nAA Median Abundance\nBB
## 1 NA 0 0.0000000 0.00e+00
## 2 NA 0 0.0000000 0.00e+00
## 3 NA 0 0.0000000 3.90e-06
## 4 NA 0 0.0000000 5.58e-05
## 5 NA 0 0.0000000 0.00e+00
## 6 NA 0 0.0027362 0.00e+00
## Log2FoldChange (Rank)\nAA_vs_BB Mean Rank Abundance\nAA Mean Rank Abundance\nBB Occurrence (100%)\n(All)
## 1 -0.09269949 11.00 11.73 18.18
## 2 -0.03907932 11.29 11.60 18.18
## 3 -0.95817982 7.00 13.60 40.91
## 4 -0.71479501 8.00 13.13 40.91
## 5 0.26748031 13.00 10.80 13.64
## 6 0.52169761 14.50 10.10 45.45
## Occurrence (100%)\nAA Occurrence (100%)\nBB Odds Ratio (95% CI)
## 1 14.29 20.00 1.5 (2.4;0.69)
## 2 14.29 20.00 0.77 (0.26;1.3)
## 3 0.00 60.00 <NA>
## 4 14.29 53.33 Inf (Inf;Inf)
## 5 28.57 6.67 0.83 (0.48;1.2)
## 6 57.14 40.00 0.0028 (-12;12)
- Volcano
plot_volcano(
da_res = metaphlan2_ps_wilcox,
group_names = c("AA", "BB"),
x_index = "Log2FoldChange (Rank)\nAA_vs_BB",
x_index_cutoff = 0.5,
y_index = "Pvalue",
y_index_cutoff = 0.05,
group_color = c("red", "grey", "blue"),
topN = 5)

Figure 14.20: Wilcoxon Rank-Sum test (MGS example)
14.5 Systematic Information
::session_info() devtools
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.2 (2021-11-01)
## os macOS Monterey 12.2.1
## system x86_64, darwin17.0
## ui RStudio
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Asia/Shanghai
## date 2022-10-08
## rstudio 2022.07.1+554 Spotted Wakerobin (desktop)
## pandoc 2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.1.0)
## ade4 1.7-18 2021-09-16 [1] CRAN (R 4.1.0)
## ALDEx2 1.26.0 2021-10-26 [1] Bioconductor
## annotate 1.72.0 2021-10-26 [1] Bioconductor
## AnnotationDbi 1.56.2 2021-11-09 [1] Bioconductor
## ape * 5.6-2 2022-03-02 [1] CRAN (R 4.1.2)
## askpass 1.1 2019-01-13 [1] CRAN (R 4.1.0)
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
## backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.0)
## base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.0)
## bayesm 3.1-4 2019-10-15 [1] CRAN (R 4.1.0)
## Biobase * 2.54.0 2021-10-26 [1] Bioconductor
## BiocGenerics * 0.40.0 2021-10-26 [1] Bioconductor
## BiocParallel 1.28.3 2021-12-09 [1] Bioconductor
## biomformat 1.22.0 2021-10-26 [1] Bioconductor
## Biostrings 2.62.0 2021-10-26 [1] Bioconductor
## bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
## blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
## bookdown 0.27 2022-06-14 [1] CRAN (R 4.1.2)
## brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.0)
## broom 1.0.0 2022-07-01 [1] CRAN (R 4.1.2)
## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0)
## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0)
## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0)
## car 3.0-12 2021-11-06 [1] CRAN (R 4.1.0)
## carData 3.0-5 2022-01-06 [1] CRAN (R 4.1.2)
## caTools 1.18.2 2021-03-28 [1] CRAN (R 4.1.0)
## checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.1.0)
## class 7.3-20 2022-01-13 [1] CRAN (R 4.1.2)
## classInt 0.4-3 2020-04-07 [1] CRAN (R 4.1.0)
## cli 3.3.0 2022-04-25 [1] CRAN (R 4.1.2)
## cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.2)
## codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.2)
## coin 1.4-2 2021-10-08 [1] CRAN (R 4.1.0)
## colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
## compositions 2.0-4 2022-01-05 [1] CRAN (R 4.1.2)
## conflicted * 1.1.0 2021-11-26 [1] CRAN (R 4.1.0)
## corrplot 0.92 2021-11-18 [1] CRAN (R 4.1.0)
## cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.1.0)
## crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2)
## crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.1.0)
## data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0)
## DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0)
## DelayedArray 0.20.0 2021-10-26 [1] Bioconductor
## DEoptimR 1.0-10 2022-01-03 [1] CRAN (R 4.1.2)
## desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.2)
## DESeq2 1.34.0 2021-10-26 [1] Bioconductor
## devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.1.0)
## digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0)
## dplyr * 1.0.8 2022-02-08 [1] CRAN (R 4.1.2)
## DT 0.21 2022-02-26 [1] CRAN (R 4.1.2)
## e1071 1.7-9 2021-09-16 [1] CRAN (R 4.1.0)
## edgeR 3.36.0 2021-10-26 [1] Bioconductor
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
## evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
## FactoMineR 2.4 2020-12-11 [1] CRAN (R 4.1.0)
## fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
## farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0)
## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
## flashClust 1.01-2 2012-08-21 [1] CRAN (R 4.1.0)
## foreach 1.5.2 2022-02-02 [1] CRAN (R 4.1.2)
## foreign 0.8-82 2022-01-13 [1] CRAN (R 4.1.2)
## forestplot 2.0.1 2021-09-03 [1] CRAN (R 4.1.0)
## formatR 1.11 2021-06-01 [1] CRAN (R 4.1.0)
## Formula 1.2-4 2020-10-16 [1] CRAN (R 4.1.0)
## fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0)
## futile.logger * 1.4.3 2016-07-10 [1] CRAN (R 4.1.0)
## futile.options 1.0.1 2018-04-20 [1] CRAN (R 4.1.0)
## genefilter 1.76.0 2021-10-26 [1] Bioconductor
## geneplotter 1.72.0 2021-10-26 [1] Bioconductor
## generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
## GenomeInfoDb * 1.30.1 2022-01-30 [1] Bioconductor
## GenomeInfoDbData 1.2.7 2022-03-09 [1] Bioconductor
## GenomicRanges * 1.46.1 2021-11-18 [1] Bioconductor
## ggiraph 0.8.2 2022-02-22 [1] CRAN (R 4.1.2)
## ggiraphExtra 0.3.0 2020-10-06 [1] CRAN (R 4.1.2)
## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
## ggpubr * 0.4.0 2020-06-27 [1] CRAN (R 4.1.0)
## ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.0)
## ggsci 2.9 2018-05-14 [1] CRAN (R 4.1.0)
## ggsignif 0.6.3 2021-09-09 [1] CRAN (R 4.1.0)
## ggVennDiagram 1.2.1 2022-04-13 [1] Github (gaospecial/ggVennDiagram@db6742d)
## glmnet 4.1-3 2021-11-02 [1] CRAN (R 4.1.0)
## glue * 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
## Gmisc * 3.0.0 2022-01-03 [1] CRAN (R 4.1.2)
## gplots 3.1.1 2020-11-28 [1] CRAN (R 4.1.0)
## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
## gtools 3.9.2 2021-06-06 [1] CRAN (R 4.1.0)
## highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
## Hmisc 4.6-0 2021-10-07 [1] CRAN (R 4.1.0)
## hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
## htmlTable * 2.4.0 2022-01-04 [1] CRAN (R 4.1.2)
## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.0)
## httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
## igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.2)
## insight 0.17.0 2022-03-29 [1] CRAN (R 4.1.2)
## IRanges * 2.28.0 2021-10-26 [1] Bioconductor
## iterators 1.0.14 2022-02-05 [1] CRAN (R 4.1.2)
## jpeg 0.1-9 2021-07-24 [1] CRAN (R 4.1.0)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
## jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.1.2)
## kableExtra 1.3.4 2021-02-20 [1] CRAN (R 4.1.2)
## KEGGREST 1.34.0 2021-10-26 [1] Bioconductor
## KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.2)
## knitr 1.39 2022-04-26 [1] CRAN (R 4.1.2)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0)
## lambda.r 1.2.4 2019-09-18 [1] CRAN (R 4.1.0)
## lattice * 0.20-45 2021-09-22 [1] CRAN (R 4.1.2)
## latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.1.0)
## leaps 3.1 2020-01-16 [1] CRAN (R 4.1.0)
## libcoin 1.0-9 2021-09-27 [1] CRAN (R 4.1.0)
## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
## limma 3.50.1 2022-02-17 [1] Bioconductor
## locfit 1.5-9.5 2022-03-03 [1] CRAN (R 4.1.2)
## LOCOM 1.1 2022-08-05 [1] Github (yijuanhu/LOCOM@c181e0f)
## lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.0)
## magrittr * 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
## MASS 7.3-55 2022-01-13 [1] CRAN (R 4.1.2)
## Matrix 1.4-0 2021-12-08 [1] CRAN (R 4.1.0)
## MatrixGenerics * 1.6.0 2021-10-26 [1] Bioconductor
## matrixStats * 0.61.0 2021-09-17 [1] CRAN (R 4.1.0)
## mbzinb 0.2 2022-03-16 [1] local
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.0)
## metagenomeSeq 1.36.0 2021-10-26 [1] Bioconductor
## mgcv 1.8-39 2022-02-24 [1] CRAN (R 4.1.2)
## microbiome 1.16.0 2021-10-26 [1] Bioconductor
## modeltools 0.2-23 2020-03-05 [1] CRAN (R 4.1.0)
## multcomp 1.4-18 2022-01-04 [1] CRAN (R 4.1.2)
## multtest 2.50.0 2021-10-26 [1] Bioconductor
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
## mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.1.0)
## mycor 0.1.1 2018-04-10 [1] CRAN (R 4.1.0)
## NADA 1.6-1.1 2020-03-22 [1] CRAN (R 4.1.0)
## nlme * 3.1-155 2022-01-13 [1] CRAN (R 4.1.2)
## nnet 7.3-17 2022-01-13 [1] CRAN (R 4.1.2)
## openssl 2.0.0 2022-03-02 [1] CRAN (R 4.1.2)
## permute * 0.9-7 2022-01-27 [1] CRAN (R 4.1.2)
## pheatmap * 1.0.12 2019-01-04 [1] CRAN (R 4.1.0)
## phyloseq * 1.38.0 2021-10-26 [1] Bioconductor
## picante * 1.8.2 2020-06-10 [1] CRAN (R 4.1.0)
## pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
## pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
## pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.0)
## plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0)
## png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0)
## ppcor 1.1 2015-12-03 [1] CRAN (R 4.1.0)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0)
## processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0)
## protoclust 1.6.3 2019-01-31 [1] CRAN (R 4.1.0)
## proxy 0.4-26 2021-06-07 [1] CRAN (R 4.1.0)
## ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0)
## pscl 1.5.5 2020-03-07 [1] CRAN (R 4.1.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
## qvalue 2.26.0 2021-10-26 [1] Bioconductor
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
## RAIDA 1.0 2022-03-14 [1] local
## RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
## Rcpp * 1.0.8.2 2022-03-11 [1] CRAN (R 4.1.2)
## RcppZiggurat 0.1.6 2020-10-20 [1] CRAN (R 4.1.0)
## RCurl 1.98-1.6 2022-02-08 [1] CRAN (R 4.1.2)
## readr * 2.1.2 2022-01-30 [1] CRAN (R 4.1.2)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.0)
## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.0)
## reticulate 1.24 2022-01-26 [1] CRAN (R 4.1.2)
## Rfast 2.0.6 2022-02-16 [1] CRAN (R 4.1.2)
## rhdf5 2.38.1 2022-03-10 [1] Bioconductor
## rhdf5filters 1.6.0 2021-10-26 [1] Bioconductor
## Rhdf5lib 1.16.0 2021-10-26 [1] Bioconductor
## rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2)
## rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.1.2)
## robustbase 0.93-9 2021-09-27 [1] CRAN (R 4.1.0)
## rpart 4.1.16 2022-01-24 [1] CRAN (R 4.1.2)
## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
## RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.1.0)
## RSQLite 2.2.10 2022-02-17 [1] CRAN (R 4.1.2)
## rstatix 0.7.0 2021-02-13 [1] CRAN (R 4.1.0)
## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
## Rtsne 0.15 2018-11-10 [1] CRAN (R 4.1.0)
## RVenn 1.1.0 2019-07-18 [1] CRAN (R 4.1.0)
## rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.0)
## S4Vectors * 0.32.3 2021-11-21 [1] Bioconductor
## sandwich 3.0-1 2021-05-18 [1] CRAN (R 4.1.0)
## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
## scatterplot3d 0.3-41 2018-03-14 [1] CRAN (R 4.1.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0)
## sf 1.0-7 2022-03-07 [1] CRAN (R 4.1.2)
## shape 1.4.6 2021-05-19 [1] CRAN (R 4.1.0)
## sjlabelled 1.2.0 2022-04-10 [1] CRAN (R 4.1.2)
## sjmisc 2.8.9 2021-12-03 [1] CRAN (R 4.1.0)
## stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
## SummarizedExperiment * 1.24.0 2021-10-26 [1] Bioconductor
## survival 3.3-1 2022-03-03 [1] CRAN (R 4.1.2)
## svglite 2.1.0 2022-02-03 [1] CRAN (R 4.1.2)
## systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.1.2)
## tensorA 0.36.2 2020-11-19 [1] CRAN (R 4.1.0)
## testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.2)
## TH.data 1.1-0 2021-09-27 [1] CRAN (R 4.1.0)
## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
## tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.1.2)
## tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
## truncnorm 1.0-8 2018-02-27 [1] CRAN (R 4.1.0)
## tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0)
## umap 0.2.8.0 2022-03-23 [1] CRAN (R 4.1.2)
## units 0.8-0 2022-02-05 [1] CRAN (R 4.1.2)
## usethis * 2.1.5 2021-12-09 [1] CRAN (R 4.1.0)
## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
## uuid 1.0-3 2021-11-01 [1] CRAN (R 4.1.0)
## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
## vegan * 2.5-7 2020-11-28 [1] CRAN (R 4.1.0)
## VennDiagram * 1.7.3 2022-04-12 [1] CRAN (R 4.1.2)
## viridis * 0.6.2 2021-10-13 [1] CRAN (R 4.1.0)
## viridisLite * 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
## vroom 1.5.7 2021-11-30 [1] CRAN (R 4.1.0)
## webshot 0.5.3 2022-04-14 [1] CRAN (R 4.1.2)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
## Wrench 1.12.0 2021-10-26 [1] Bioconductor
## xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
## XMAS2 * 2.1.8 2022-10-08 [1] local
## XML 3.99-0.9 2022-02-24 [1] CRAN (R 4.1.2)
## xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0)
## XVector 0.34.0 2021-10-26 [1] Bioconductor
## yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
## zCompositions 1.4.0 2022-01-13 [1] CRAN (R 4.1.2)
## zlibbioc 1.40.0 2021-10-26 [1] Bioconductor
## zoo 1.8-9 2021-03-09 [1] CRAN (R 4.1.0)
##
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
##
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────