Zhai Jing, Choi Youngwon, Yang Xingyi, Chen Yin, Knox Kenneth, Twigg Homer L, Won Joong-Ho, Zhou Hua, Zhou Jin J
Department of Epidemiology and Biostatistics, College of Public Health, University of Arizona, Tucson, AZ 85724, USA.
Department of Statistics, Seoul National University, Seoul 08826, Korea.
Stat Biosci. 2025 Apr;17(1):191-215. doi: 10.1007/s12561-024-09434-9. Epub 2024 Jun 14.
Evidence linking the microbiome to human health is rapidly growing. The microbiome profile has the potential as a novel predictive biomarker for many diseases. However, tables of bacterial counts are typically sparse, and bacteria are classified within a hierarchy of taxonomic levels, ranging from species to phylum. Existing tools focus on identifying microbiome associations at either the community level or a specific, pre-defined taxonomic level. Incorporating the evolutionary relationship between bacteria can enhance data interpretation. This approach allows for aggregating microbiome contributions, leading to more accurate and interpretable results. We present DeepBiome, a phylogeny-informed neural network architecture, to predict phenotypes from microbiome counts and uncover the microbiome-phenotype association network. It utilizes microbiome abundance as input and employs phylogenetic taxonomy to guide the neural network's architecture. Leveraging phylogenetic information, DeepBiome is applicable to both regression and reduces the need for extensive tuning of the deep learning architecture, minimizes overfitting, and, crucially, enables the visualization of the path from microbiome counts to disease. It classification problems. Simulation studies and real-life data analysis have shown that DeepBiome is both highly accurate and efficient. It offers deep insights into complex microbiome-phenotype associations, even with small to moderate training sample sizes. In practice, the specific taxonomic level at which microbiome clusters tag the association remains unknown. Therefore, the main advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host-microbe interactions, which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and TensorFlow. It is an open-source tool available at https://github.com/Young-won/DeepBiome.
将微生物组与人类健康联系起来的证据正在迅速增加。微生物组图谱有潜力成为许多疾病的新型预测生物标志物。然而,细菌计数表通常很稀疏,并且细菌是在从物种到门的分类水平层次结构中分类的。现有工具专注于在群落水平或特定的、预定义的分类水平上识别微生物组关联。纳入细菌之间的进化关系可以增强数据解释。这种方法允许汇总微生物组的贡献,从而产生更准确和可解释的结果。我们提出了DeepBiome,一种基于系统发育的神经网络架构,用于从微生物组计数预测表型并揭示微生物组-表型关联网络。它将微生物组丰度用作输入,并采用系统发育分类法来指导神经网络的架构。利用系统发育信息,DeepBiome适用于回归和分类问题。模拟研究和实际数据分析表明,DeepBiome既高度准确又高效。即使训练样本量小到中等,它也能深入了解复杂的微生物组-表型关联。在实践中,微生物组聚类标记关联的具体分类水平仍然未知。因此,与其他分析方法相比,所提出方法的主要优势在于它提供了对宿主-微生物相互作用的生态学和进化理解,这对于基于微生物组的医学很重要。DeepBiome是使用Python包Keras和TensorFlow实现的。它是一个开源工具,可在https://github.com/Young-won/DeepBiome上获取。