Suppr超能文献

利用血浆细胞因子对冠心病风险进行分类的机器学习和统计方法

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines.

作者信息

Saharan Seema Singh, Nagar Pankaj, Creasy Kate Townsend, Stock Eveline O, Feng James, Malloy Mary J, Kane John P

机构信息

Department of Statistics, University of Rajasthan, Jaipur, India.

Voluntary Data Scientist UCSF Kane Lab, San Francisco, USA.

出版信息

BioData Min. 2021 Apr 15;14(1):26. doi: 10.1186/s13040-021-00260-z.

Abstract

BACKGROUND

As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted "At Risk" CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of "At Risk" CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates.

RESULTS

A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval (CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score. Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.

摘要

背景

根据世界卫生组织2017年的情况说明书,冠状动脉疾病(CAD)是全球主要死因,占总死亡人数的31%。2016年,CAD导致了史无前例的1760万人死亡,这凸显了促进积极主动和加速的早期诊断的迫切需求。可以利用创新且不断涌现的机器学习(ML)技术来促进CAD的早期检测,这是挽救生命的关键因素。像血管造影术这样能提供可靠证据的标准技术具有侵入性,而且通常成本高昂且有风险。相比之下,基于ML模型生成的诊断是非侵入性的、快速、准确且经济实惠的。因此,ML算法可以用作传统方法的补充或先导。本研究展示了K近邻(k-NN)和随机森林ML算法的实施及对比分析,以利用一组新出现的35种细胞因子生物标志物实现针对“有风险”CAD的分类,这些生物标志物是强有力地指示性预测变量,可能成为治疗的潜在靶点。为确保更好的通用性,模型中整合了数据平衡、用于超参数调整的重复k折交叉验证等机制。为确定模型实现的“有风险”CAD与对照组之间的可分离性效果,使用了受试者操作特征曲线下面积(AUROC)指标,该指标通过展示假阳性率和真阳性率之间的权衡来区分类别。

结果

共开发了2个分类器,均使用35种细胞因子预测特征构建。使用35种细胞因子生物标志物的随机森林分类器获得了最佳AUROC分数0.99,95%置信区间(CI)为(0.982,0.999)。使用35种细胞因子的k-NN模型获得了第二佳AUROC分数0.954,95%置信区间为(0.929,0.979)。通过独立t检验获得的p值小于7.481e-10,验证了随机森林分类器在AUROC分数方面显著优于k-NN分类器。目前,随着大规模努力推动实现对CAD风险个体的早期、快速、可靠、经济实惠且可及的检测,强大的ML算法的应用可以用作血管造影术等传统方法的补充。通过纳入65种新型且敏感的细胞因子生物标志物,早期检测可以进一步改善。对细胞因子在CAD中新兴作用的研究可以实质性地提高风险检测以及疾病机制的发现,这可能会带来新的治疗方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ff1/8050889/52e71cae3a5a/13040_2021_260_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验