基于电子病历，使用无监督机器学习聚类法检测心血管疾病。

Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records.

作者信息

Hu Ying, Yan Hai, Liu Ming, Gao Jing, Xie Lianhong, Zhang Chunyu, Wei Lili, Ding Yinging, Jiang Hong

机构信息

Department of Cardiology, National Clinical Research Center for Interventional Medicine, Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.

Shanghai Engineering Research Center of AI Technology for Cardiopulmonary Diseases, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.

出版信息

BMC Med Res Methodol. 2024 Dec 19;24(1):309. doi: 10.1186/s12874-024-02422-z.

DOI:10.1186/s12874-024-02422-z

PMID:39702064

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11658374/

Abstract

BACKGROUND

Electronic medical records (EMR)-trained machine learning models have the potential in CVD risk prediction by integrating a range of medical data from patients, facilitate timely diagnosis and classification of CVDs. We tested the hypothesis that unsupervised ML approach utilizing EMR could be used to develop a new model for detecting prevalent CVD in clinical settings.

METHODS

We included 155,894 patients (aged ≥ 18 years) discharged between January 2014 and July 2022, from Xuhui Hospital, Shanghai, China, including 64,916 CVD cases and 90,979 non-CVD cases. K-means clustering was used to generate the clustering models with k = 2, 4, and 8 as predetermined number of clusters k = 2, 4, and 8. Bayesian theorem was used to estimate the models' predictive accuracy.

RESULTS

The overall predictive accuracy of the 2-, 4-, and 8-classification clustering models in the training set was 0.856, 0.8634, and 0.8506, respectively. Similarly, the predictive accuracy of the 2-, 4-, and 8-classification clustering models in the testing set was 0.8598, 0.8659, and 0.8525, respectively. After reducing from 19 dimensions to 2 dimensions by principal component analysis, significant separation was observed for CVD cases and non-CVD cases in both training and testing sets.

CONCLUSION

Our findings indicate that the utilization of EMR data can support the development of a robust model for CVD detection through an unsupervised ML approach. Further investigation using longitudinal design is needed to refine the model for its applications in clinical settings.

摘要

背景

经过电子病历（EMR）训练的机器学习模型通过整合患者的一系列医学数据，在心血管疾病（CVD）风险预测方面具有潜力，有助于及时诊断和分类CVD。我们检验了这样一个假设，即利用EMR的无监督机器学习方法可用于开发一种在临床环境中检测CVD的新模型。

方法

我们纳入了2014年1月至2022年7月间从中国上海徐汇医院出院的155,894名患者（年龄≥18岁），其中包括64,916例CVD病例和90,979例非CVD病例。使用K均值聚类生成聚类模型，预定聚类数k分别为2、4和8。使用贝叶斯定理估计模型的预测准确性。

结果

训练集中2类、4类和8类聚类模型的总体预测准确性分别为0.856、0.8634和0.8506。同样，测试集中2类、4类和8类聚类模型的预测准确性分别为0.8598、0.8659和0.8525。通过主成分分析从19维降至2维后，在训练集和测试集中均观察到CVD病例和非CVD病例有明显分离。

结论

我们的研究结果表明，利用EMR数据可以通过无监督机器学习方法支持开发一个强大的CVD检测模型。需要使用纵向设计进行进一步研究，以完善该模型在临床环境中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7444/11658374/a25412b26b45/12874_2024_2422_Fig1_HTML.jpg

相似文献

Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records.基于电子病历，使用无监督机器学习聚类法检测心血管疾病。

BMC Med Res Methodol. 2024 Dec 19;24(1):309. doi: 10.1186/s12874-024-02422-z.

An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques.一种使用无监督机器学习技术预测印度泰米尔纳德邦贡伯戈讷姆周边心血管疾病相关风险因素的有效方法。

Sci Rep. 2025 Feb 13;15(1):5369. doi: 10.1038/s41598-025-89403-4.

Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering.基于模型和无模型技术在肌萎缩侧索硬化症诊断预测和患者聚类中的应用。

Neuroinformatics. 2019 Jul;17(3):407-421. doi: 10.1007/s12021-018-9406-9.

Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning.利用无监督机器学习在护理电子健康记录中识别和评估阿尔茨海默病的临床亚型。

BMC Med Inform Decis Mak. 2021 Dec 8;21(1):343. doi: 10.1186/s12911-021-01693-6.

PREDICTIVE MODELING OF HOSPITAL READMISSION RATES USING ELECTRONIC MEDICAL RECORD-WIDE MACHINE LEARNING: A CASE-STUDY USING MOUNT SINAI HEART FAILURE COHORT.使用全电子病历机器学习对医院再入院率进行预测建模：以西奈山心力衰竭队列为例的研究

Pac Symp Biocomput. 2017;22:276-287. doi: 10.1142/9789813207813_0027.

Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study.基于电子健康记录的张量分解检测时变表型主题：心血管疾病案例研究。

J Biomed Inform. 2019 Oct;98:103270. doi: 10.1016/j.jbi.2019.103270. Epub 2019 Aug 22.

Predicting cardiovascular disease in patients with mental illness using machine learning.使用机器学习预测精神疾病患者的心血管疾病。

Eur Psychiatry. 2025 Jan 8;68(1):e12. doi: 10.1192/j.eurpsy.2025.1.

Machine learning model for cardiovascular disease prediction in patients with chronic kidney disease.机器学习模型预测慢性肾脏病患者心血管疾病

Front Endocrinol (Lausanne). 2024 May 28;15:1390729. doi: 10.3389/fendo.2024.1390729. eCollection 2024.

Predicting cardiovascular outcomes in Chinese patients with type 2 diabetes by combining risk factor trajectories and machine learning algorithm: a cohort study.通过结合风险因素轨迹和机器学习算法预测中国2型糖尿病患者的心血管结局：一项队列研究

Cardiovasc Diabetol. 2025 Feb 7;24(1):61. doi: 10.1186/s12933-025-02611-0.

Development of a falls risk assessment scale for middle-aged and elderly patients with cardiovascular diseases in Chinese community based on AutoScore: A CHARLS study.基于自动评分法的中国社区中老年心血管疾病患者跌倒风险评估量表的研制：一项中国健康与养老追踪调查（CHARLS）研究

Public Health. 2025 Mar;240:167-173. doi: 10.1016/j.puhe.2025.01.031. Epub 2025 Feb 5.

引用本文的文献

Retinal Microvascular Characteristics-Novel Risk Stratification in Cardiovascular Diseases.视网膜微血管特征——心血管疾病中的新型风险分层

Diagnostics (Basel). 2025 Apr 23;15(9):1073. doi: 10.3390/diagnostics15091073.

Correlation of multiple peripheral blood parameters with metastasis and invasion of papillary thyroid cancer: a retrospective cohort study.多种外周血参数与甲状腺乳头状癌转移和侵袭的相关性：一项回顾性队列研究

Endocrine. 2025 Jun;88(3):757-765. doi: 10.1007/s12020-025-04194-y. Epub 2025 Mar 1.

本文引用的文献

Harnessing EHR data for health research.利用电子健康记录数据进行健康研究。

Nat Med. 2024 Jul;30(7):1847-1855. doi: 10.1038/s41591-024-03074-8. Epub 2024 Jul 4.

Machine Learning in Cardiovascular Risk Prediction and Precision Preventive Approaches.机器学习在心血管风险预测和精准预防方法中的应用。

Curr Atheroscler Rep. 2023 Dec;25(12):1069-1081. doi: 10.1007/s11883-023-01174-3. Epub 2023 Nov 27.

Long-Term Mortality in Patients With Severe Hypercholesterolemia Phenotype From a Racial and Ethnically Diverse US Cohort.美国种族和族裔多元化队列中严重高胆固醇血症表型患者的长期死亡率。

Circulation. 2024 Feb 6;149(6):417-426. doi: 10.1161/CIRCULATIONAHA.123.064566. Epub 2023 Nov 16.

2023 AHA/ACC/ACCP/ASPC/NLA/PCNA Guideline for the Management of Patients With Chronic Coronary Disease: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines.2023 年 AHA/ACC/ACCP/ASPC/NLA/PCNA 慢性冠状动脉疾病患者管理指南：美国心脏协会/美国心脏病学会联合临床实践指南委员会的报告。

Circulation. 2023 Aug 29;148(9):e9-e119. doi: 10.1161/CIR.0000000000001168. Epub 2023 Jul 20.

A risk factor attention-based model for cardiovascular disease prediction.基于风险因素注意力的心血管疾病预测模型。

BMC Bioinformatics. 2022 Oct 14;23(Suppl 8):425. doi: 10.1186/s12859-022-04963-w.

New Progress in Early Diagnosis of Atherosclerosis.动脉粥样硬化早期诊断的新进展。

Int J Mol Sci. 2022 Aug 11;23(16):8939. doi: 10.3390/ijms23168939.

An evolutionary machine learning algorithm for cardiovascular disease risk prediction.一种用于心血管疾病风险预测的进化机器学习算法。

PLoS One. 2022 Jul 28;17(7):e0271723. doi: 10.1371/journal.pone.0271723. eCollection 2022.

Statistical power for cluster analysis.聚类分析的统计功效。

BMC Bioinformatics. 2022 May 31;23(1):205. doi: 10.1186/s12859-022-04675-1.

Automating and improving cardiovascular disease prediction using Machine learning and EMR data features from a regional healthcare system.利用机器学习和来自区域医疗保健系统的电子病历数据特征来自动化和改进心血管疾病预测。

Int J Med Inform. 2022 Jul;163:104786. doi: 10.1016/j.ijmedinf.2022.104786. Epub 2022 Apr 29.

Prediction of incident atherosclerotic cardiovascular disease with polygenic risk of metabolic disease: Analysis of 3 prospective cohort studies in Korea.基于代谢性疾病多基因风险预测冠心病事件的发生：来自韩国 3 项前瞻性队列研究的分析。

Atherosclerosis. 2022 May;348:16-24. doi: 10.1016/j.atherosclerosis.2022.03.021. Epub 2022 Mar 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于电子病历，使用无监督机器学习聚类法检测心血管疾病。

Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献