Suppr超能文献

基于电子病历,使用无监督机器学习聚类法检测心血管疾病。

Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records.

作者信息

Hu Ying, Yan Hai, Liu Ming, Gao Jing, Xie Lianhong, Zhang Chunyu, Wei Lili, Ding Yinging, Jiang Hong

机构信息

Department of Cardiology, National Clinical Research Center for Interventional Medicine, Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.

Shanghai Engineering Research Center of AI Technology for Cardiopulmonary Diseases, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.

出版信息

BMC Med Res Methodol. 2024 Dec 19;24(1):309. doi: 10.1186/s12874-024-02422-z.

Abstract

BACKGROUND

Electronic medical records (EMR)-trained machine learning models have the potential in CVD risk prediction by integrating a range of medical data from patients, facilitate timely diagnosis and classification of CVDs. We tested the hypothesis that unsupervised ML approach utilizing EMR could be used to develop a new model for detecting prevalent CVD in clinical settings.

METHODS

We included 155,894 patients (aged ≥ 18 years) discharged between January 2014 and July 2022, from Xuhui Hospital, Shanghai, China, including 64,916 CVD cases and 90,979 non-CVD cases. K-means clustering was used to generate the clustering models with k = 2, 4, and 8 as predetermined number of clusters k = 2, 4, and 8. Bayesian theorem was used to estimate the models' predictive accuracy.

RESULTS

The overall predictive accuracy of the 2-, 4-, and 8-classification clustering models in the training set was 0.856, 0.8634, and 0.8506, respectively. Similarly, the predictive accuracy of the 2-, 4-, and 8-classification clustering models in the testing set was 0.8598, 0.8659, and 0.8525, respectively. After reducing from 19 dimensions to 2 dimensions by principal component analysis, significant separation was observed for CVD cases and non-CVD cases in both training and testing sets.

CONCLUSION

Our findings indicate that the utilization of EMR data can support the development of a robust model for CVD detection through an unsupervised ML approach. Further investigation using longitudinal design is needed to refine the model for its applications in clinical settings.

摘要

背景

经过电子病历(EMR)训练的机器学习模型通过整合患者的一系列医学数据,在心血管疾病(CVD)风险预测方面具有潜力,有助于及时诊断和分类CVD。我们检验了这样一个假设,即利用EMR的无监督机器学习方法可用于开发一种在临床环境中检测CVD的新模型。

方法

我们纳入了2014年1月至2022年7月间从中国上海徐汇医院出院的155,894名患者(年龄≥18岁),其中包括64,916例CVD病例和90,979例非CVD病例。使用K均值聚类生成聚类模型,预定聚类数k分别为2、4和8。使用贝叶斯定理估计模型的预测准确性。

结果

训练集中2类、4类和8类聚类模型的总体预测准确性分别为0.856、0.8634和0.8506。同样,测试集中2类、4类和8类聚类模型的预测准确性分别为0.8598、0.8659和0.8525。通过主成分分析从19维降至2维后,在训练集和测试集中均观察到CVD病例和非CVD病例有明显分离。

结论

我们的研究结果表明,利用EMR数据可以通过无监督机器学习方法支持开发一个强大的CVD检测模型。需要使用纵向设计进行进一步研究,以完善该模型在临床环境中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7444/11658374/a25412b26b45/12874_2024_2422_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验