利用非实验室风险因素和机器学习筛查高血压：印度尼西亚的一项回顾性横断面研究。

Screening hypertension using non-laboratory risk factors with machine learning: a retrospective cross-sectional study in Indonesia.

作者信息

Estiko Reza Ishak, Widyantoro Bambang, Juzar Dafsah Arifa, Yusup Ramdhan Maulana, Rakhmat Iqbal Fauzi, Rijanto Estiko

机构信息

Prof. Dr. Margono Soekarjo General Hospital, Purwokerto, Central Java, Indonesia.

Department of Cardiology and Vascular Medicine, Faculty of Medicine, Universitas Indonesia/National Cardiovascular Center Harapan Kita, Jakarta, Special Capital Region of Jakarta, Indonesia.

出版信息

BMJ Open. 2025 Aug 27;15(8):e092364. doi: 10.1136/bmjopen-2024-092364.

DOI:10.1136/bmjopen-2024-092364

PMID:40866068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12406909/

Abstract

OBJECTIVE

This study aimed to screen for hypertension in a vast Indonesian population using machine learning (ML) and 11 non-laboratory risk factors, validating the results through internal and external validations.

SETTING AND PARTICIPANTS

From the initial 1 782 365 participants aged 15 and above registered at the Integrated Counseling Post primary care centres across Indonesia from 2014 to 2017, incomplete data and outliers were excluded, and 268 210 participants were included in our analysis. The dataset was split deterministically into a dataset for training using 10-fold internal cross-validation of 204 315 participants and another dataset for external validation of 63 895 participants.

DESIGN

This retrospective cross-sectional study used three ML algorithms, that is, random forest, gradient boosting and extreme gradient boosting (XGBoost), and compared them against logistic regression as a benchmark to screen hypertension based on the WHO and International Society of Hypertension criteria. The importance of the risk factors was ranked. By partly using continuous versus categorical age, waist circumference (WC) and body mass index (BMI) risk factors, we evaluated the screening performance regarding sensitivity and area under the receiver operating characteristic curve (AUC).

RESULTS

The external validations revealed that the XGBoost model performed the best in hypertension screening. The external validation, which partly uses continuous variables, provides 0.97 sensitivity and 0.75 AUC, indicating excellent screening capability. The importance rank of the risk factors was consecutively family history of hypertension (FH-HTN), age, WC, BMI, occupation, education, sex, smoking, low physical activity, lack of fruit or vegetable intake and alcohol consumption.

CONCLUSIONS

By using 11 easy-to-collect non-laboratory risk factors, the ML model successfully screens for hypertension with better performance than the benchmark. Using the numerical variables of age, WC and BMI yields a better discrimination capability than the categorical variables. FH-HTN and age are the two top risk factors for the development of hypertension. This study is a useful academic exercise and shows ML's importance in handling large data sets.

摘要

目的

本研究旨在利用机器学习（ML）和11个非实验室风险因素在广大印度尼西亚人群中筛查高血压，并通过内部和外部验证来验证结果。

设置与参与者

在2014年至2017年期间于印度尼西亚各地综合咨询初级保健中心登记的1782365名15岁及以上的初始参与者中，排除了不完整数据和异常值，268210名参与者纳入我们的分析。数据集被确定性地分为用于204315名参与者的10倍内部交叉验证训练的数据集和用于63895名参与者外部验证的另一个数据集。

设计

这项回顾性横断面研究使用了三种ML算法，即随机森林、梯度提升和极端梯度提升（XGBoost），并将它们与逻辑回归作为基准进行比较，以根据世界卫生组织和国际高血压学会标准筛查高血压。对风险因素的重要性进行了排名。通过部分使用连续与分类的年龄、腰围（WC）和体重指数（BMI）风险因素，我们评估了筛查在敏感性和受试者工作特征曲线下面积（AUC）方面的性能。