基于纵向微生物组的可解释机器学习用于在疾病结局早期预测中识别随时间变化的生物标志物。

Longitudinal Microbiome-based Interpretable Machine Learning for Identification of Time-Varying Biomarkers in Early Prediction of Disease Outcomes.

作者信息

Dai Yifan, Qian Yunzhi, Qu Yixiang, Guan Wyliena, Xie Jialiu, Wang Duan, Butler Catherine, Dashper Stuart, Carroll Ian, Divaris Kimon, Liu Yufeng, Wu Di

机构信息

Department of Biostatistics, Gillings School of Global Public Health at University of North Carolina at Chapel Hill.

Department of Nutrition, Gillings School of Global Public Health at University of North Carolina at Chapel Hill.

出版信息

bioRxiv. 2024 Nov 20:2024.10.18.619118. doi: 10.1101/2024.10.18.619118.

DOI:10.1101/2024.10.18.619118

PMID:39605360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601495/

Abstract

Information generated from longitudinally-sampled microbial data has the potential to illuminate important aspects of development and progression for many human conditions and diseases. Identifying microbial biomarkers and their time-varying effects can not only advance our understanding of pathogenetic mechanisms, but also facilitate early diagnosis and guide optimal timing of interventions. However, longitudinal predictive modeling of highly noisy and dynamic microbial data (e.g., metagenomics) poses analytical challenges. To overcome these challenges, we introduce a robust and interpretable machine-learning-based longitudinal microbiome analysis framework, LP-Micro, that encompasses: (i) longitudinal microbial feature screening via a polynomial group lasso, (ii) disease outcome prediction implemented via machine learning methods (e.g., XGBoost, deep neural networks), and (iii) interpretable association testing between time points, microbial features, and disease outcomes via permutation feature importance. We demonstrate in simulations that LP-Micro can not only identify incident disease-related microbiome taxa but also offers improved prediction accuracy compared to existing approaches. Applications of LP-Micro in two longitudinal microbiome studies with clinical outcomes of childhood dental disease and weight loss following bariatric surgery yield consistently high prediction accuracy. The identified critical early predictive time points are informative and aligned with clinical expectations.

摘要

从纵向采样的微生物数据中生成的信息有可能揭示许多人类疾病和病症发展与进展的重要方面。识别微生物生物标志物及其随时间变化的影响，不仅可以促进我们对致病机制的理解，还能有助于早期诊断并指导干预的最佳时机。然而，对高度嘈杂且动态的微生物数据（例如宏基因组学数据）进行纵向预测建模带来了分析挑战。为了克服这些挑战，我们引入了一个强大且可解释的基于机器学习的纵向微生物组分析框架LP-Micro，该框架包括：（i）通过多项式组套索进行纵向微生物特征筛选；（ii）通过机器学习方法（例如XGBoost、深度神经网络）实现疾病结局预测；以及（iii）通过排列特征重要性对时间点、微生物特征和疾病结局之间进行可解释的关联测试。我们在模拟中证明，LP-Micro不仅可以识别与疾病发生相关的微生物组分类群，而且与现有方法相比，还能提供更高的预测准确性。LP-Micro在两项纵向微生物组研究中的应用，一项研究儿童牙科疾病的临床结局，另一项研究减肥手术后的体重减轻情况，均产生了始终如一的高预测准确性。所确定的关键早期预测时间点具有参考价值，并且与临床预期相符。