Ghosh Probir Kumar, Islam Md Aminul, Haque Md Ahshanul, Tariqujjaman Md, Das Novel Chandra, Ali Mohammad, Uddin Md Rasel, Harun Md Golam Dostogir
International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh.
PLoS Comput Biol. 2025 Jul 2;21(7):e1013211. doi: 10.1371/journal.pcbi.1013211. eCollection 2025 Jul.
Hypertension poses a significant public health challenge in low- and middle-income countries. In Bangladesh, the Health Population and Nutrition Sector Development Program has shown effectiveness in resource-limited settings. Estimating causal relationships on hypertension while adjusting for nonlinear observed confounders in adult population is complex. This study aims to identify predictors of hypertension, and explore observational causal inference on hypertension.
The hypertension data was analyzed using Bangladesh Demographic and Health surveys data from the 2011 and 2022. We used 11,815 individuals aged 34 years and above. Hypertension was defined as a systolic blood pressure of > 140 mm Hg and/or a diastolic blood pressure of > 90 mm Hg and/or having a history of hypertension. We used logistic regression, Random forest model, Double Machine Learning (DML), and Shapley Additive exPlanations (SHAP) based on a pre-defined causal structure.
The dataset included 11,815 individuals, and the prevalence of hypertension was 38.40%. The average age of individuals was 52.76 years (SD: 12.97), and 6826 (58.77%) were male. The Random forest model achieved 93% accuracy, with evaluation f1-scores of 95% for non-hypertension and 91% for hypertension, and identified older age, female gender, urban residency, workers, wealthier, self-awareness, and excessive body weight as key predictors of hypertension. The individual conditional expectation and SHAP plots reveal that age, and body mass index (BMI) are nonlinear relation with hypertension. The crude OR between excessive body weight and hypertension was 2.24 (95%CI: 2.07 - 2.42). Adjusted for age, sex, socioeconomic status (SES), and self-awareness, the OR was 1.97 (95%CI: 1.79 - 2.17), and using de-biased method, it was 1.30 (95%CI: 1.17 - 1.43).
The study highlights important predictors of hypertension, including age, sex, residency, and socioeconomic status (SES), self-awareness and body weight. The machine learning model achieved an accuracy of 93% in predicting hypertension. The de-biased methods provided a more refined risk estimate. Age and excessive body weight were found to significantly contributed to hypertension, demonstrating complex interactions and varying marginal effects across different levels of these factors. Awareness programs and targeted interventions are vital to effectively reduce excessive body weight and prevent hypertension.
高血压在低收入和中等收入国家构成了重大的公共卫生挑战。在孟加拉国,卫生、人口与营养部门发展计划已在资源有限的环境中显示出成效。在调整成年人群中非线性观察到的混杂因素的同时估计高血压的因果关系很复杂。本研究旨在确定高血压的预测因素,并探索高血压的观察性因果推断。
使用2011年和2022年孟加拉国人口与健康调查数据对高血压数据进行分析。我们纳入了11815名34岁及以上的个体。高血压定义为收缩压>140 mmHg和/或舒张压>90 mmHg和/或有高血压病史。我们基于预定义的因果结构使用了逻辑回归、随机森林模型、双机器学习(DML)和夏普利值加法解释(SHAP)。
数据集包括11815名个体,高血压患病率为38.40%。个体的平均年龄为52.76岁(标准差:12.97),6826名(58.77%)为男性。随机森林模型的准确率达到93%,非高血压的评估f1分数为95%,高血压的评估f1分数为91%,并确定年龄较大、女性、城市居住、工人、较富裕、自我意识和体重超标为高血压的关键预测因素。个体条件期望和SHAP图显示,年龄和体重指数(BMI)与高血压呈非线性关系。体重超标与高血压之间的粗优势比为2.24(95%置信区间:2.07 - 2.42)。在调整年龄、性别、社会经济地位(SES)和自我意识后,优势比为1.97(95%置信区间:1.79 - 2.17),使用去偏方法后为1.30(95%置信区间:1.17 - 1.43)。
该研究突出了高血压的重要预测因素,包括年龄、性别、居住情况和社会经济地位(SES)、自我意识和体重。机器学习模型在预测高血压方面的准确率达到93%。去偏方法提供了更精确的风险估计。发现年龄和体重超标对高血压有显著影响,表明在这些因素的不同水平上存在复杂的相互作用和不同的边际效应。提高认识计划和有针对性的干预措施对于有效减轻体重超标和预防高血压至关重要。