Suppr超能文献

基于医学大数据和机器学习算法对高血压患者中风的准确预测:回顾性研究

Accurate Prediction of Stroke for Hypertensive Patients Based on Medical Big Data and Machine Learning Algorithms: Retrospective Study.

作者信息

Yang Yujie, Zheng Jing, Du Zhenzhen, Li Ye, Cai Yunpeng

机构信息

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

University of Chinese Academy of Sciences, Beijing, China.

出版信息

JMIR Med Inform. 2021 Nov 10;9(11):e30277. doi: 10.2196/30277.

Abstract

BACKGROUND

Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote disease risk factor discovery and prognosis, attracting broad research interest.

OBJECTIVE

We aimed to establish a high-precision stroke risk prediction model for hypertensive patients based on historical electronic medical record data and machine learning algorithms.

METHODS

Based on the Shenzhen Health Information Big Data Platform, a total of 57,671 patients were screened from 250,788 registered patients with hypertension, of whom 9421 had stroke onset during the 3-year follow-up. In addition to baseline characteristics and historical symptoms, we constructed some trend characteristics from multitemporal medical records. Stratified sampling according to gender ratio and age stratification was implemented to balance the positive and negative cases, and the final 19,953 samples were randomly divided into a training set and test set according to a ratio of 7:3. We used 4 machine learning algorithms for modeling, and the risk prediction performance was compared with the traditional risk scales. We also analyzed the nonlinear effect of continuous characteristics on stroke onset.

RESULTS

The tree-based integration algorithm extreme gradient boosting achieved the optimal performance with an area under the receiver operating characteristic curve of 0.9220, surpassing the other 3 traditional machine learning algorithms. Compared with 2 traditional risk scales, the Framingham stroke risk profiles and the Chinese Multiprovincial Cohort Study, our proposed model achieved better performance on the independent validation set, and the area under the receiver operating characteristic value increased by 0.17. Further nonlinear effect analysis revealed the importance of multitemporal trend characteristics in stroke risk prediction, which will benefit the standardized management of hypertensive patients.

CONCLUSIONS

A high-precision 3-year stroke risk prediction model for hypertensive patients was established, and the model's performance was verified by comparing it with the traditional risk scales. Multitemporal trend characteristics played an important role in stroke onset, and thus the model could be deployed to electronic health record systems to assist in more pervasive, preemptive stroke risk screening, enabling higher efficiency of early disease prevention and intervention.

摘要

背景

中风风险评估是一级预防的重要手段,但现有中风风险评估量表在中国人群中的适用性一直存在争议。前瞻性研究是医学研究的常用方法,但耗时且费力。医学大数据已被证明可促进疾病风险因素的发现和预后评估,吸引了广泛的研究兴趣。

目的

我们旨在基于历史电子病历数据和机器学习算法,为高血压患者建立高精度的中风风险预测模型。

方法

基于深圳健康信息大数据平台,从250788例登记的高血压患者中筛选出57671例患者,其中9421例在3年随访期间发生中风。除了基线特征和历史症状外,我们还从多时间点病历中构建了一些趋势特征。根据性别比例和年龄分层进行分层抽样,以平衡阳性和阴性病例,最终将19953个样本按照7:3的比例随机分为训练集和测试集。我们使用4种机器学习算法进行建模,并将风险预测性能与传统风险量表进行比较。我们还分析了连续特征对中风发病的非线性影响。

结果

基于树的集成算法极端梯度提升实现了最优性能,受试者操作特征曲线下面积为0.9220,超过了其他3种传统机器学习算法。与两种传统风险量表,即弗雷明汉中风风险评估表和中国多省队列研究相比,我们提出的模型在独立验证集上表现更好,受试者操作特征值下面积增加了0.17。进一步的非线性效应分析揭示了多时间点趋势特征在中风风险预测中的重要性,这将有利于高血压患者的规范化管理。

结论

建立了高血压患者高精度的3年中风风险预测模型,并通过与传统风险量表比较验证了模型的性能。多时间点趋势特征在中风发病中起重要作用,因此该模型可部署到电子健康记录系统中,以协助进行更广泛、先发制人的中风风险筛查,提高早期疾病预防和干预的效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3eab/8663532/109d4d395e3b/medinform_v9i11e30277_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验