利用血浆细胞因子对冠心病风险进行分类的机器学习和统计方法

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines.

作者信息

Saharan Seema Singh, Nagar Pankaj, Creasy Kate Townsend, Stock Eveline O, Feng James, Malloy Mary J, Kane John P

机构信息

Department of Statistics, University of Rajasthan, Jaipur, India.

Voluntary Data Scientist UCSF Kane Lab, San Francisco, USA.

出版信息

BioData Min. 2021 Apr 15;14(1):26. doi: 10.1186/s13040-021-00260-z.

DOI:10.1186/s13040-021-00260-z

PMID:33858484

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8050889/

Abstract

BACKGROUND

As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted "At Risk" CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of "At Risk" CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates.

RESULTS

A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval (CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score. Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.

摘要

背景

根据世界卫生组织2017年的情况说明书，冠状动脉疾病（CAD）是全球主要死因，占总死亡人数的31%。2016年，CAD导致了史无前例的1760万人死亡，这凸显了促进积极主动和加速的早期诊断的迫切需求。可以利用创新且不断涌现的机器学习（ML）技术来促进CAD的早期检测，这是挽救生命的关键因素。像血管造影术这样能提供可靠证据的标准技术具有侵入性，而且通常成本高昂且有风险。相比之下，基于ML模型生成的诊断是非侵入性的、快速、准确且经济实惠的。因此，ML算法可以用作传统方法的补充或先导。本研究展示了K近邻（k-NN）和随机森林ML算法的实施及对比分析，以利用一组新出现的35种细胞因子生物标志物实现针对“有风险”CAD的分类，这些生物标志物是强有力地指示性预测变量，可能成为治疗的潜在靶点。为确保更好的通用性，模型中整合了数据平衡、用于超参数调整的重复k折交叉验证等机制。为确定模型实现的“有风险”CAD与对照组之间的可分离性效果，使用了受试者操作特征曲线下面积（AUROC）指标，该指标通过展示假阳性率和真阳性率之间的权衡来区分类别。

结果

共开发了2个分类器，均使用35种细胞因子预测特征构建。使用35种细胞因子生物标志物的随机森林分类器获得了最佳AUROC分数0.99，95%置信区间（CI）为（0.982，0.999）。使用35种细胞因子的k-NN模型获得了第二佳AUROC分数0.954，95%置信区间为（0.929，0.979）。通过独立t检验获得的p值小于7.481e-10，验证了随机森林分类器在AUROC分数方面显著优于k-NN分类器。目前，随着大规模努力推动实现对CAD风险个体的早期、快速、可靠、经济实惠且可及的检测，强大的ML算法的应用可以用作血管造影术等传统方法的补充。通过纳入65种新型且敏感的细胞因子生物标志物，早期检测可以进一步改善。对细胞因子在CAD中新兴作用的研究可以实质性地提高风险检测以及疾病机制的发现，这可能会带来新的治疗方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ff1/8050889/52e71cae3a5a/13040_2021_260_Fig1_HTML.jpg

相似文献

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines.利用血浆细胞因子对冠心病风险进行分类的机器学习和统计方法

BioData Min. 2021 Apr 15;14(1):26. doi: 10.1186/s13040-021-00260-z.

Advanced detection of coronary artery disease via deep learning analysis of plasma cytokine data.通过对血浆细胞因子数据进行深度学习分析实现冠状动脉疾病的高级检测。

Front Cardiovasc Med. 2024 Mar 8;11:1365481. doi: 10.3389/fcvm.2024.1365481. eCollection 2024.

Construction of genetic classification model for coronary atherosclerosis heart disease using three machine learning methods.基于三种机器学习方法构建冠状动脉粥样硬化性心脏病的遗传分类模型。

BMC Cardiovasc Disord. 2022 Feb 12;22(1):42. doi: 10.1186/s12872-022-02481-4.

Scoring of Coronary Artery Disease Characteristics on Coronary CT Angiograms by Using Machine Learning.基于机器学习的冠状动脉 CT 血管造影术的冠状动脉疾病特征评分。

Radiology. 2019 Aug;292(2):354-362. doi: 10.1148/radiol.2019182061. Epub 2019 Jun 25.

Machine Learning Predictive Models for Coronary Artery Disease.用于冠状动脉疾病的机器学习预测模型

SN Comput Sci. 2021;2(5):350. doi: 10.1007/s42979-021-00731-4. Epub 2021 Jun 22.

Minimal Patient Clinical Variables to Accurately Predict Stress Echocardiography Outcome: Validation Study Using Machine Learning Techniques.准确预测负荷超声心动图结果所需的最少患者临床变量：使用机器学习技术的验证研究

JMIR Cardio. 2020 May 29;4(1):e16975. doi: 10.2196/16975.

Non-invasive Coronary Artery Disease Screening Based on Electrocardiogram Characteristics and Clinical Risk Factors.基于心电图特征和临床危险因素的非侵入性冠状动脉疾病筛查

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340892.

Prediction of coronary artery disease using urinary proteomics.利用尿液蛋白质组学预测冠状动脉疾病

Eur J Prev Cardiol. 2023 Oct 10;30(14):1537-1546. doi: 10.1093/eurjpc/zwad087.

Development and Validation of a Predictive Model for Coronary Artery Disease Using Machine Learning.使用机器学习开发和验证冠状动脉疾病预测模型

Front Cardiovasc Med. 2021 Feb 2;8:614204. doi: 10.3389/fcvm.2021.614204. eCollection 2021.

Breast Cancer Diagnosis Using an Efficient CAD System Based on Multiple Classifiers.基于多分类器的高效计算机辅助检测系统用于乳腺癌诊断

Diagnostics (Basel). 2019 Oct 26;9(4):165. doi: 10.3390/diagnostics9040165.

引用本文的文献

Logistic Regression and Statistical Regularization Techniques for Risk Classification of Coronary Artery Disease using Cytokines transported by high density lipoproteins.使用高密度脂蛋白转运的细胞因子进行冠心病风险分类的逻辑回归和统计正则化技术

Proc (Int Conf Comput Sci Comput Intell). 2023 Dec;2023:652-660. doi: 10.1109/csci62032.2023.00114. Epub 2024 Jul 19.

Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类

Proc (Int Conf Comput Sci Comput Intell). 2023 Dec;2023:686-694. doi: 10.1109/csci62032.2023.00118. Epub 2024 Jul 19.

Advanced detection of coronary artery disease via deep learning analysis of plasma cytokine data.通过对血浆细胞因子数据进行深度学习分析实现冠状动脉疾病的高级检测。

Front Cardiovasc Med. 2024 Mar 8;11:1365481. doi: 10.3389/fcvm.2024.1365481. eCollection 2024.

Integrated web portal for non-destructive salt sensitivity detection of seeds using fluorescent and visible light images coupled with machine learning algorithms.用于使用荧光和可见光图像以及机器学习算法对种子进行无损盐敏感性检测的集成网络门户。

Front Plant Sci. 2024 Jan 11;14:1303429. doi: 10.3389/fpls.2023.1303429. eCollection 2023.

Risk factors for high CAD-RADS scoring in CAD patients revealed by machine learning methods: a retrospective study.机器学习方法揭示的 CAD 患者高 CAD-RADS 评分的风险因素：一项回顾性研究。

PeerJ. 2023 Aug 3;11:e15797. doi: 10.7717/peerj.15797. eCollection 2023.

Cerebrospinal fluid cytokines and chemokines exhibit distinct profiles in bacterial meningitis and viral meningitis.脑脊液中的细胞因子和趋化因子在细菌性脑膜炎和病毒性脑膜炎中表现出不同的特征。

Exp Ther Med. 2023 Mar 22;25(5):204. doi: 10.3892/etm.2023.11903. eCollection 2023 May.

Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis.基于生物知识倾斜随机森林方法的钙化性主动脉瓣狭窄分类

BioData Min. 2021 Jul 23;14(1):35. doi: 10.1186/s13040-021-00269-4.

本文引用的文献

Image-Based Cardiac Diagnosis With Machine Learning: A Review.基于图像的机器学习心脏诊断：综述

Front Cardiovasc Med. 2020 Jan 24;7:1. doi: 10.3389/fcvm.2020.00001. eCollection 2020.

Machine Learning for Assessment of Coronary Artery Disease in Cardiac CT: A Survey.用于心脏CT中冠状动脉疾病评估的机器学习：一项综述

Front Cardiovasc Med. 2019 Nov 26;6:172. doi: 10.3389/fcvm.2019.00172. eCollection 2019.

Cytokine Profile Distinguishes Children With Plasmodium falciparum Malaria From Those With Bacterial Blood Stream Infections.细胞因子谱可区分疟原虫性疟疾与细菌性血流感染患儿。

J Infect Dis. 2020 Mar 16;221(7):1098-1106. doi: 10.1093/infdis/jiz587.

Cardiovascular disease as a leading cause of death: how are pharmacists getting involved?心血管疾病作为主要死因：药剂师如何参与其中？

Integr Pharm Res Pract. 2019 Feb 4;8:1-11. doi: 10.2147/IPRP.S133088. eCollection 2019.

Inflammatory profiles revealed the dysregulation of cytokines in adult patients of HFMD.炎症谱显示，手足口病成年患者的细胞因子失调。

Int J Infect Dis. 2019 Feb;79:12-20. doi: 10.1016/j.ijid.2018.11.001. Epub 2018 Nov 10.

Anti-inflammatory therapy with canakinumab for atherosclerotic disease: lessons from the CANTOS trial.卡那单抗用于动脉粥样硬化疾病的抗炎治疗：来自CANTOS试验的经验教训。

J Thorac Dis. 2018 Feb;10(2):695-698. doi: 10.21037/jtd.2018.01.119.

Automated Diagnosis of Coronary Artery Disease: A Review and Workflow.冠状动脉疾病的自动诊断：综述与工作流程

Cardiol Res Pract. 2018 Feb 4;2018:2016282. doi: 10.1155/2018/2016282. eCollection 2018.

Overview of the IL-1 family in innate inflammation and acquired immunity.IL-1 家族在天然免疫和获得性免疫中的概述。

Immunol Rev. 2018 Jan;281(1):8-27. doi: 10.1111/imr.12621.

Antiinflammatory Therapy with Canakinumab for Atherosclerotic Disease.卡那奴单抗治疗动脉粥样硬化疾病的抗炎疗法。

N Engl J Med. 2017 Sep 21;377(12):1119-1131. doi: 10.1056/NEJMoa1707914. Epub 2017 Aug 27.

QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases.使用 K 最近邻算法 (KNN) 进行 QRS 检测，并在标准 ECG 数据库上进行评估。

J Adv Res. 2013 Jul;4(4):331-44. doi: 10.1016/j.jare.2012.05.007. Epub 2012 Jul 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用血浆细胞因子对冠心病风险进行分类的机器学习和统计方法

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

背景

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献