Bora Krishnarjun, Kalimuthusamy Natarajaseenivasan, Gogoi Ananya Jyoti, Garh Namita, Rabidas Manisha, Chanda Gargi, Das Rajshree, Borah Prasanta Kumar
Department of Epidemiology and Nutrition Section, ICMR-Regional Medical Research Centre, NE Region, Dibrugarh, Assam, India.
Department of Microbiology, Centre of Excellence for Life Sciences, Bharathidasan University, Tiruchirappalli, India.
Indian J Med Res. 2025 Apr;161(4):394-405. doi: 10.25259/IJMR_881_2024.
Background & objectives Hypertension affects a sizable section of the world population and is being recognised as a growing problem. Its prediction using machine learning (ML) algorithms, will add to its control and prevention. The objective of the present investigation was to check the applicability of ML approaches in the prediction and detection of hypertension. Methods We included 53,301 participants at baseline from a health and demographic surveillance system in Dibrugarh, Assam (Dibrugarh-HDSS). We constructed two models, one at baseline and the other after two years of follow-up. Of the total participants (baseline: 29,402; follow up: 4,400), 70 per cent were randomly selected to fit seven popular classification models namely decision tree classifier (DTC), random forest classifier (RFC), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression, Ada-boost classifier, and XG boost classifier. The data from the remaining 30 per cent were used to evaluate the performance of the models. Results In the baseline data, the Ada-boost classifier could identify hypertension with a maximum accuracy score of 87.02 per cent (CI: 86.01-88.03). The maximum area under the curve (AUC) score of 98.37 per cent (CI: 97.36-99.38) was obtained under RFC. For the prediction of risk at two years, the maximum average accuracy score of 77.57 per cent (CI: 76.6-78.54) was achieved under X-G Boost followed by RFC (77.2%, CI: 76.15-78.25) and a maximum AUC of (85.82%, CI: 84.88-86.76) was obtained under RFC. Interpretation & conclusions In both the identification and prediction of hypertension, RFC was found to be better than the other classifiers. 'Waist circumference' followed by 'body mass index' (BMI) were found to have maximum relative importance in the identification of hypertension, while in the case of two-year risk prediction, the baseline 'systolic blood pressure' (SBP), diastolic blood pressure (DBP), and 'BMI' had the maximum relative importance. The findings revealed the potential of predictive models in accurately identifying high-risk individuals, enabling timely interventions, and optimising clinical decision-making.
高血压影响着世界上相当一部分人口,并且正被视为一个日益严重的问题。使用机器学习(ML)算法对其进行预测,将有助于对其进行控制和预防。本研究的目的是检验ML方法在高血压预测和检测中的适用性。方法:我们纳入了来自阿萨姆邦迪布鲁格尔健康与人口监测系统(迪布鲁格尔- HDSS)基线期的53301名参与者。我们构建了两个模型,一个在基线期,另一个在随访两年后。在所有参与者中(基线期:29402人;随访期:4400人),70%被随机选取以拟合7种常用分类模型,即决策树分类器(DTC)、随机森林分类器(RFC)、支持向量机(SVM)、线性判别分析(LDA)、逻辑回归、Ada - boost分类器和XG boost分类器。其余30%的数据用于评估模型的性能。结果:在基线期数据中,Ada - boost分类器识别高血压的最高准确率为87.02%(置信区间:86.01 - 88.03)。RFC获得的曲线下面积(AUC)最高分数为98.37%(置信区间:97.36 - 99.38)。对于两年风险预测,X - G Boost获得的最高平均准确率为77.57%(置信区间:76.6 - 78.54),其次是RFC(77.2%,置信区间:76.15 - 78.25),RFC获得的最大AUC为(85.82%,置信区间:84.88 - 86.76)。解读与结论:在高血压的识别和预测方面,RFC被发现优于其他分类器。在高血压识别中,“腰围”其次是“体重指数”(BMI)被发现具有最大相对重要性,而在两年风险预测中,基线期的“收缩压”(SBP)、舒张压(DBP)和“BMI”具有最大相对重要性。研究结果揭示了预测模型在准确识别高危个体、实现及时干预以及优化临床决策方面的潜力。