Suppr超能文献

代谢组学模型选择:使用自动化机器学习预测冠心病的诊断。

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.

Department of Cardiology, Division Heart and Lungs, Utrecht, The Netherlands.

出版信息

Bioinformatics. 2020 Mar 1;36(6):1772-1778. doi: 10.1093/bioinformatics/btz796.

Abstract

MOTIVATION

Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES).

RESULTS

We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes.

AVAILABILITY AND IMPLEMENTATION

TPOT is freely available via http://epistasislab.github.io/tpot/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

为给定的数据集选择最佳的机器学习 (ML) 模型通常具有挑战性。自动化机器学习 (AutoML) 已经成为一种强大的工具,可以自动选择 ML 方法和参数设置,以预测生物医学终点。在这里,我们应用基于树的管道优化工具 (TPOT) 来预测冠状动脉疾病 (CAD) 的血管造影诊断。使用 TPOT,ML 模型表示为表达式树,并使用称为遗传编程的随机搜索方法发现最优管道。我们提供了一些基于 TPOT 的 ML 管道选择和优化的指南,这些指南基于 Angiography 和 Genes 研究 (ANGES) 中的各种临床表型和高通量代谢谱。

结果

我们分析了 ANGES 队列中基于核磁共振的脂蛋白和代谢物谱,目的是确定非阻塞性 CAD 患者在 CAD 诊断中的作用。我们对 TPOT 生成的 ML 管道与使用网格搜索方法优化的选定 ML 分类器进行了比较分析,应用于两种表型 CAD 谱。结果表明,TPOT 生成的 ML 管道在多个性能指标(包括平衡准确性和精度-召回曲线下面积)上优于网格搜索优化模型。使用选定的模型,我们证明了区分非阻塞性 CAD 患者和无 CAD 患者的表型谱与更高的精度相关,这表明这些表型之间存在潜在过程的差异。

可用性和实现

TPOT 可通过 http://epistasislab.github.io/tpot/ 免费获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c278/7703753/f987add1c10c/btz796f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验