Wang Bin, Shen Yulong, Fang Jingyan, Su Xiaoquan, Xu Zhenjiang Zech
School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
School of Information Engineering, Nanchang University, Nanchang, 330031, China.
Adv Sci (Weinh). 2024 Dec;11(45):e2404277. doi: 10.1002/advs.202404277. Epub 2024 Oct 15.
Microbial data analysis poses significant challenges due to its high dimensionality, sparsity, and compositionality. Recent advances have shown that integrating abundance and phylogenetic information is an effective strategy for uncovering robust patterns and enhancing the predictive performance in microbiome studies. However, existing methods primarily focus on the hierarchical structure of phylogenetic trees, overlooking the evolutionary distances embedded within them. This study introduces DeepPhylo, a novel method that employs phylogeny-aware amplicon embeddings to effectively integrate abundance and phylogenetic information. DeepPhylo improves both the unsupervised discriminatory power and supervised predictive accuracy of microbiome data analysis. Compared to the existing methods, DeepPhylo demonstrates superiority in informing biologically relevant insights across five real-world microbiome use cases, including clustering of skin microbiomes, prediction of host chronological age and gender, diagnosis of inflammatory bowel disease (IBD) across 15 studies, and multilabel disease classification.
微生物数据分析因其高维度、稀疏性和组成性而面临重大挑战。最近的进展表明,整合丰度和系统发育信息是在微生物组研究中发现稳健模式并提高预测性能的有效策略。然而,现有方法主要关注系统发育树的层次结构,而忽略了其中嵌入的进化距离。本研究介绍了DeepPhylo,这是一种新颖的方法,它采用系统发育感知扩增子嵌入来有效整合丰度和系统发育信息。DeepPhylo提高了微生物组数据分析的无监督辨别能力和监督预测准确性。与现有方法相比,DeepPhylo在为五个实际微生物组用例提供生物学相关见解方面表现出优越性,包括皮肤微生物群的聚类、宿主实际年龄和性别的预测、15项研究中的炎症性肠病(IBD)诊断以及多标签疾病分类。