Suppr超能文献

用于预测大动脉粥样硬化的生物标志物发现的机器学习方法。

Machine learning approaches for biomarker discovery to predict large-artery atherosclerosis.

机构信息

Artificial Intelligence Center, China Medical University Hospital, Taichung, Taiwan.

Department of Neurology, China Medical University Hospital, Taichung, Taiwan.

出版信息

Sci Rep. 2023 Sep 13;13(1):15139. doi: 10.1038/s41598-023-42338-0.

Abstract

Large-artery atherosclerosis (LAA) is a leading cause of cerebrovascular disease. However, LAA diagnosis is costly and needs professional identification. Many metabolites have been identified as biomarkers of specific traits. However, there are inconsistent findings regarding suitable biomarkers for the prediction of LAA. In this study, we propose a new method integrates multiple machine learning algorithms and feature selection method to handle multidimensional data. Among the six machine learning models, logistic regression (LR) model exhibited the best prediction performance. The value of area under the receiver operating characteristic curve (AUC) was 0.92 when 62 features were incorporated in the external validation set for the LR model. In this model, LAA could be well predicted by clinical risk factors including body mass index, smoking, and medications for controlling diabetes, hypertension, and hyperlipidemia as well as metabolites involved in aminoacyl-tRNA biosynthesis and lipid metabolism. In addition, we found that 27 features were present among the five adopted models that could provide good results. If these 27 features were used in the LR model, an AUC value of 0.93 could be achieved. Our study has demonstrated the effectiveness of combining machine learning algorithms with recursive feature elimination and cross-validation methods for biomarker identification. Moreover, we have shown that using shared features can yield more reliable correlations than either model, which can be valuable for future identification of LAA.

摘要

大动脉粥样硬化(LAA)是脑血管病的主要原因。然而,LAA 的诊断成本高,需要专业识别。许多代谢物已被确定为特定特征的生物标志物。然而,对于预测 LAA 的合适生物标志物,存在不一致的发现。在这项研究中,我们提出了一种新的方法,该方法结合了多种机器学习算法和特征选择方法来处理多维数据。在这六种机器学习模型中,逻辑回归(LR)模型表现出最佳的预测性能。当将 62 个特征纳入外部验证集中时,LR 模型的接收者操作特征曲线(ROC)下面积(AUC)值为 0.92。在该模型中,LAA 可以通过包括体重指数、吸烟和控制糖尿病、高血压和高脂血症的药物在内的临床危险因素以及涉及氨基酸-tRNA 生物合成和脂质代谢的代谢物来很好地预测。此外,我们发现 27 个特征存在于 5 种采用的模型中,可以提供良好的结果。如果将这 27 个特征用于 LR 模型,则可以达到 AUC 值 0.93。我们的研究表明,结合机器学习算法和递归特征消除以及交叉验证方法进行生物标志物识别是有效的。此外,我们还表明,使用共享特征可以产生比任何一种模型更可靠的相关性,这对于未来 LAA 的识别可能具有重要价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b9f/10499778/cc1bbbd4cd1d/41598_2023_42338_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验