Park Jaram, Kim Jeong-Whun, Ryu Borim, Heo Eunyoung, Jung Se Young, Yoo Sooyoung
Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
Department of Otorhinolaryngology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
J Med Internet Res. 2019 Feb 15;21(2):e11757. doi: 10.2196/11757.
Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension.
Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level.
We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients' medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics.
Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered.
We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.
慢性病的预防和管理是国家健康维护计划的主要目标。以前广泛使用的筛查工具,如健康风险评估,由于其局限性,如静态特征、可及性和普遍性,在实现这一目标方面受到限制。高血压是需要通过全国性健康维护计划进行管理的最重要的慢性病之一,医疗保健提供者应告知患者高血压引起并发症的风险。
我们的目标是开发并比较预测高血压患者高危血管疾病的机器学习模型,以便他们能够根据风险水平管理血压。
我们使用了全国样本队列的12年纵向数据集,其中包含514,866名患者的数据,并允许跟踪韩国所有医疗保健提供者的患者病史(N = 51,920)。为确保我们模型的普遍性,我们使用了由国民健康保险服务局发布的另一个包含100万不同患者的全国样本队列数据集进行外部验证。从每个数据集中,我们获取了74,535名和59,738名原发性高血压患者的数据,并开发了用于预测心血管和脑血管事件的机器学习模型。开发并比较了六种机器学习模型,以根据验证指标评估性能。
机器学习算法使我们能够根据患者病史检测高危患者。基于长短期记忆的算法在内部测试中表现最佳(F1分数 = 0.772,外部测试F1分数 = 0.613),基于随机森林的风险预测算法在普遍性方面比其他机器学习算法表现更好(内部测试F1分数 = 0.757,外部测试F1分数 = 0.705)。关于特征数量,在内部测试中,基于长短期记忆的算法无论特征数量多少都表现最佳。然而,在外部测试中,基于随机森林的算法是最好的,无论遇到的特征数量如何。
我们开发并比较了预测高血压患者高危血管疾病的机器学习模型,以便他们能够根据风险水平管理血压。通过依赖预测模型,政府可以在全国范围内预测高危患者,并提前制定医疗保健政策。