• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.

作者信息

Du Zhenzhen, Yang Yujie, Zheng Jing, Li Qi, Lin Denan, Li Ye, Fan Jianping, Cheng Wen, Chen Xie-Hui, Cai Yunpeng

机构信息

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

Fiberhome Technologies College, Wuhan Research Institute of Posts and Telecommunications, Wuhan, China.

出版信息

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

DOI:10.2196/17257
PMID:32628616
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7381262/
Abstract

BACKGROUND

Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated.

OBJECTIVE

Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning.

METHODS

Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance.

RESULTS

An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores.

CONCLUSIONS

We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions.

摘要

背景

基于健康记录对心血管疾病风险进行预测长期以来一直吸引着广泛的研究兴趣。尽管付出了巨大努力,但预测准确性仍不尽人意。这就引发了一个问题,即数据不足、统计和机器学习方法,还是内在噪声阻碍了先前方法的性能,以及如何缓解这些问题。

目的

基于中国深圳大量高血压患者群体,我们旨在通过大数据和机器学习建立一个高精度的冠心病(CHD)预测模型。

方法

从冠心病发病前1 - 3年(冠心病阳性病例)或超过3年的无病随访期(冠心病阴性病例)的电子健康记录(EHRs)中调查了42676名高血压患者的大样本队列数据,其中包括20156例冠心病发病患者。将该群体平均分为独立的训练集和测试集。在训练集上采用各种机器学习方法以实现高精度预测模型,并将结果与传统统计方法和知名风险量表进行比较。进行比较分析以研究训练样本大小、因素集和建模方法对预测性能的影响。

结果

一种集成方法XGBoost在预测独立测试数据集3年冠心病发病方面取得了高精度,受试者工作特征曲线(AUC)下面积值为0.943。比较分析表明,在相同数据集上,非线性模型(K近邻AUC为0.908,随机森林AUC为0.938)优于线性模型(逻辑回归AUC为0.865),并且机器学习方法显著超越了传统风险量表或固定模型(如弗雷明汉心血管疾病风险模型)。进一步分析表明,与仅使用静态特征相比,使用从多个记录中获得的时间相关特征,包括统计变量和变化趋势变量,有助于提高性能。亚组分析表明,特征设计对模型准确性的影响比对群体大小的影响更显著。边际效应分析表明,传统因素和EHR因素在风险评分方面均表现出高度非线性特征。

结论

我们证明,在有足够大的训练数据群体的情况下,从EHRs中准确预测冠心病风险是可能的。复杂的机器学习方法在应对疾病预测的异质性和非线性本质方面发挥了重要作用。此外,多个时间点积累的EHR数据提供了对风险预测有价值的额外特征。我们的研究强调了从EHRs中积累大数据以进行准确疾病预测的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/13c6d7f6bd98/medinform_v8i7e17257_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/bf9b56d1fc91/medinform_v8i7e17257_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/20878a0ad34a/medinform_v8i7e17257_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/c070b7ea8e76/medinform_v8i7e17257_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/24194cbc91b6/medinform_v8i7e17257_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/13c6d7f6bd98/medinform_v8i7e17257_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/bf9b56d1fc91/medinform_v8i7e17257_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/20878a0ad34a/medinform_v8i7e17257_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/c070b7ea8e76/medinform_v8i7e17257_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/24194cbc91b6/medinform_v8i7e17257_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ac/7381262/13c6d7f6bd98/medinform_v8i7e17257_fig5.jpg

相似文献

1
Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病:模型开发与性能评估
JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.
2
Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study.基于医学大数据和机器学习算法对高血压患者中风的准确预测:回顾性研究
JMIR Med Inform. 2021 Nov 10;9(11):e30277. doi: 10.2196/30277.
3
Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.基于电子病历中的屈光数据预测中国学龄儿童近视进展:一项回顾性、多中心机器学习研究。
PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov.
4
The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease.时间范围对分类准确性的影响:机器学习在预测冠心病发病中的应用。
JMIR Cardio. 2022 Nov 2;6(2):e38040. doi: 10.2196/38040.
5
Predicting post-stroke pneumonia using deep neural network approaches.使用深度神经网络方法预测卒中后肺炎。
Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.
6
Machine learning-based prediction of postpartum hemorrhage after vaginal delivery: combining bleeding high risk factors and uterine contraction curve.基于机器学习的阴道分娩后产后出血预测:结合出血高危因素和子宫收缩曲线。
Arch Gynecol Obstet. 2022 Oct;306(4):1015-1025. doi: 10.1007/s00404-021-06377-0. Epub 2022 Feb 16.
7
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.急诊科脓毒症患者院内死亡率的预测:一种基于本地大数据驱动的机器学习方法。
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.
8
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。
BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.
9
Explainable machine learning model for predicting the occurrence of postoperative malnutrition in children with congenital heart disease.用于预测先天性心脏病儿童术后发生营养不良的可解释机器学习模型。
Clin Nutr. 2022 Jan;41(1):202-210. doi: 10.1016/j.clnu.2021.11.006. Epub 2021 Nov 10.
10
Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study.冠状动脉血运重建后6个月内主要不良心血管事件发生的风险预测:机器学习研究
JMIR Med Inform. 2022 Apr 20;10(4):e33395. doi: 10.2196/33395.

引用本文的文献

1
Synergistic review of automation impact of big data, AI, and ML in current data transformative era.大数据、人工智能和机器学习在当前数据变革时代的自动化影响协同综述。
F1000Res. 2025 May 22;14:253. doi: 10.12688/f1000research.161477.2. eCollection 2025.
2
Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌:病例对照研究
JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.
3
Advancements in deep learning for early diagnosis of Alzheimer's disease using multimodal neuroimaging: challenges and future directions.

本文引用的文献

1
Association of Sleep Duration and Quality With Subclinical Atherosclerosis.睡眠时长和质量与亚临床动脉粥样硬化的关系。
J Am Coll Cardiol. 2019 Jan 22;73(2):134-144. doi: 10.1016/j.jacc.2018.10.060.
2
Assessment of Risk Factors and Biomarkers Associated With Risk of Cardiovascular Disease Among Women Consuming a Mediterranean Diet.评估女性食用地中海饮食与心血管疾病风险相关的危险因素和生物标志物。
JAMA Netw Open. 2018 Dec 7;1(8):e185708. doi: 10.1001/jamanetworkopen.2018.5708.
3
Primary Prevention of Cardiovascular Disease with a Mediterranean Diet Supplemented with Extra-Virgin Olive Oil or Nuts.
基于多模态神经影像学的深度学习在阿尔茨海默病早期诊断中的进展:挑战与未来方向。
Front Neuroinform. 2025 May 2;19:1557177. doi: 10.3389/fninf.2025.1557177. eCollection 2025.
4
Optimising coronary imaging decisions with machine learning: an external validation study.利用机器学习优化冠状动脉成像决策:一项外部验证研究。
Open Heart. 2025 Apr 24;12(1):e003072. doi: 10.1136/openhrt-2024-003072.
5
Advanced applications in chronic disease monitoring using IoT mobile sensing device data, machine learning algorithms and frame theory: a systematic review.利用物联网移动传感设备数据、机器学习算法和框架理论在慢性病监测中的高级应用:一项系统综述。
Front Public Health. 2025 Feb 21;13:1510456. doi: 10.3389/fpubh.2025.1510456. eCollection 2025.
6
Urban and rural disparities in stroke prediction using machine learning among Chinese older adults.中国老年人中使用机器学习进行中风预测的城乡差异。
Sci Rep. 2025 Feb 25;15(1):6779. doi: 10.1038/s41598-025-91157-y.
7
Identification of novel serum lipid metabolism potential markers and metabolic pathways for oral cancer: a population-based study.口腔癌新型血清脂质代谢潜在标志物及代谢途径的鉴定:一项基于人群的研究。
BMC Cancer. 2025 Jan 30;25(1):177. doi: 10.1186/s12885-025-13561-x.
8
Machine-learning-based prediction of cardiovascular events for hyperlipidemia population with lipid variability and remnant cholesterol as biomarkers.以脂质变异性和残留胆固醇作为生物标志物,基于机器学习对高脂血症人群心血管事件进行预测。
Health Inf Sci Syst. 2024 Nov 11;12(1):51. doi: 10.1007/s13755-024-00310-w. eCollection 2024 Dec.
9
Artificial intelligence in healthcare: a scoping review of perceived threats to patient rights and safety.医疗保健中的人工智能:对患者权利和安全的感知威胁的范围综述
Arch Public Health. 2024 Oct 23;82(1):188. doi: 10.1186/s13690-024-01414-1.
10
Multimodal Identification of Molecular Factors Linked to Severe Diabetic Foot Ulcers Using Artificial Intelligence.基于人工智能的严重糖尿病足溃疡相关分子因素的多模态识别。
Int J Mol Sci. 2024 Oct 4;25(19):10686. doi: 10.3390/ijms251910686.
补充特级初榨橄榄油或坚果的地中海饮食对心血管疾病的一级预防
N Engl J Med. 2018 Jun 21;378(25):e34. doi: 10.1056/NEJMoa1800389. Epub 2018 Jun 13.
4
Optical frequency domain imaging vs. intravascular ultrasound in percutaneous coronary intervention (OPINION trial): one-year angiographic and clinical results.光学相干断层成像与血管内超声在经皮冠状动脉介入治疗中的比较(OPINION 试验):一年的血管造影和临床结果。
Eur Heart J. 2017 Nov 7;38(42):3139-3147. doi: 10.1093/eurheartj/ehx351.
5
Environmental Determinants of Cardiovascular Disease.心血管疾病的环境决定因素
Circ Res. 2017 Jul 7;121(2):162-180. doi: 10.1161/CIRCRESAHA.117.306458.
6
Global, Regional, and National Burden of Cardiovascular Diseases for 10 Causes, 1990 to 2015.1990年至2015年全球、区域和国家10种心血管疾病病因负担
J Am Coll Cardiol. 2017 Jul 4;70(1):1-25. doi: 10.1016/j.jacc.2017.04.052. Epub 2017 May 17.
7
Use of Machine Learning Classifiers and Sensor Data to Detect Neurological Deficit in Stroke Patients.使用机器学习分类器和传感器数据检测中风患者的神经功能缺损
J Med Internet Res. 2017 Apr 18;19(4):e120. doi: 10.2196/jmir.7092.
8
Can machine-learning improve cardiovascular risk prediction using routine clinical data?机器学习能否利用常规临床数据改善心血管疾病风险预测?
PLoS One. 2017 Apr 4;12(4):e0174944. doi: 10.1371/journal.pone.0174944. eCollection 2017.
9
Association between clinically recorded alcohol consumption and initial presentation of 12 cardiovascular diseases: population based cohort study using linked health records.临床记录的饮酒量与12种心血管疾病的初次发病之间的关联:基于人群的队列研究,使用关联的健康记录
BMJ. 2017 Mar 22;356:j909. doi: 10.1136/bmj.j909.
10
Risk scoring for the primary prevention of cardiovascular disease.心血管疾病一级预防的风险评分
Cochrane Database Syst Rev. 2017 Mar 14;3(3):CD006887. doi: 10.1002/14651858.CD006887.pub4.