IRD, Sorbonne University, UMMISCO, 32 Avenue Henri Varagnat, F-93143 Bondy, France.
Institute of Cardiometabolism and Nutrition, ICAN, Integromics, 91 Boulevard de l'Hopital, F-75013, Paris, France.
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa010.
Microbiome biomarker discovery for patient diagnosis, prognosis, and risk evaluation is attracting broad interest. Selected groups of microbial features provide signatures that characterize host disease states such as cancer or cardio-metabolic diseases. Yet, the current predictive models stemming from machine learning still behave as black boxes and seldom generalize well. Their interpretation is challenging for physicians and biologists, which makes them difficult to trust and use routinely in the physician-patient decision-making process. Novel methods that provide interpretability and biological insight are needed. Here, we introduce "predomics", an original machine learning approach inspired by microbial ecosystem interactions that is tailored for metagenomics data. It discovers accurate predictive signatures and provides unprecedented interpretability. The decision provided by the predictive model is based on a simple, yet powerful score computed by adding, subtracting, or dividing cumulative abundance of microbiome measurements.
Tested on >100 datasets, we demonstrate that predomics models are simple and highly interpretable. Even with such simplicity, they are at least as accurate as state-of-the-art methods. The family of best models, discovered during the learning process, offers the ability to distil biological information and to decipher the predictability signatures of the studied condition. In a proof-of-concept experiment, we successfully predicted body corpulence and metabolic improvement after bariatric surgery using pre-surgery microbiome data.
Predomics is a new algorithm that helps in providing reliable and trustworthy diagnostic decisions in the microbiome field. Predomics is in accord with societal and legal requirements that plead for an explainable artificial intelligence approach in the medical field.
微生物组生物标志物的发现可用于患者的诊断、预后和风险评估,这引起了广泛的关注。选择的微生物特征组提供了特征,可用于描述宿主疾病状态,如癌症或心脏代谢疾病。然而,目前基于机器学习的预测模型仍然表现为黑盒,并且很少能够很好地泛化。它们的解释对医生和生物学家来说具有挑战性,这使得它们难以在医患决策过程中信任和常规使用。需要新的方法来提供可解释性和生物学见解。在这里,我们介绍了“predomics”,这是一种受微生物生态系统相互作用启发的原始机器学习方法,专门针对宏基因组学数据。它发现了准确的预测特征,并提供了前所未有的可解释性。预测模型提供的决策是基于通过添加、减去或除以微生物组测量的累积丰度来计算的简单而强大的得分。
在 100 多个数据集上进行测试,我们证明了 predomics 模型简单且高度可解释。即使如此简单,它们的准确性至少与最先进的方法相当。在学习过程中发现的最佳模型家族提供了提取生物学信息和解密所研究条件的可预测性特征的能力。在一项概念验证实验中,我们成功地使用术前微生物组数据预测了肥胖症患者的身体肥胖程度和代谢改善。
Predomics 是一种新算法,有助于在微生物组领域提供可靠和值得信赖的诊断决策。Predomics 符合社会和法律要求,即在医学领域提倡使用可解释的人工智能方法。