Department of Obstetrics and Gynecology, Peking University Ninth School of Clinical Medicine, Beijing Shijitan Hospital, Beijing 100038, China.
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
Int J Med Inform. 2024 Aug;188:105480. doi: 10.1016/j.ijmedinf.2024.105480. Epub 2024 May 9.
Metabolic syndrome (MetS) is considered to be an important parameter of cardio-metabolic health and contributing to the development of atherosclerosis, type 2 diabetes. The incidence of MetS significantly increases in postmenopausal women, therefore, the perimenopausal period is considered a critical phase for prevention. We aimed to use four machine learning methods to predict whether perimenopausal women will develop MetS within 2 years.
Women aged 45-55 years who underwent 2 consecutive years of physical examinations in Ninth Clinical College of Peking University between January 2021 and December 2022 were included. We extracted 26 features from physical examinations, and used backward selection method to select top 10 features with the largest area under the receiver operating characteristic curve (AUC). Extreme gradient boosting (XGBoost), Random forest (RF), Multilayer perceptron (MLP) and Logistic regression (LR) were used to establish the model. Those performance were measured by AUC, accuracy, precision, recall and F1 score. SHapley Additive exPlanation (SHAP) value was used to identify risk factors affecting perimenopausal MetS.
A total of 8700 women had physical examination records, and 2,254 women finally met the inclusion criteria. For predicting MetS events, RF and XGBoost had the highest AUC (0.96, 0.95, respectively). XGBoost has the highest F1 value (F1 = 0.77), followed by RF, LR and MLP. SHAP value suggested that the top 5 variables affecting MetS in this study were Waist circumference, Fasting blood glucose, High-density lipoprotein cholesterol, Triglycerides and Diastolic blood pressure, respectively.
We've developed a targeted MetS risk prediction model for perimenopausal women, using health examination data. This model enables early identification of high MetS risk in this group, offering significant benefits for individual health management and wider socio-economic health initiatives.
代谢综合征(MetS)被认为是心代谢健康的一个重要参数,并导致动脉粥样硬化、2 型糖尿病的发生。绝经后妇女的 MetS 发病率显著增加,因此,围绝经期被认为是预防的关键阶段。我们旨在使用四种机器学习方法来预测围绝经期妇女是否会在 2 年内患上 MetS。
纳入 2021 年 1 月至 2022 年 12 月期间在北京大学第九临床医学院连续 2 年接受体检的 45-55 岁女性。我们从体检中提取了 26 个特征,并使用向后选择方法选择了前 10 个特征,这些特征的受试者工作特征曲线(AUC)下面积最大。极端梯度增强(XGBoost)、随机森林(RF)、多层感知机(MLP)和逻辑回归(LR)用于建立模型。通过 AUC、准确性、精度、召回率和 F1 评分来衡量这些性能。Shapley 加法解释(SHAP)值用于识别影响围绝经期 MetS 的危险因素。
共有 8700 名女性有体检记录,最终有 2254 名女性符合纳入标准。对于预测 MetS 事件,RF 和 XGBoost 的 AUC 最高(分别为 0.96、0.95)。XGBoost 的 F1 值最高(F1=0.77),其次是 RF、LR 和 MLP。SHAP 值表明,影响本研究中 MetS 的前 5 个变量分别是腰围、空腹血糖、高密度脂蛋白胆固醇、甘油三酯和舒张压。
我们使用健康检查数据为围绝经期妇女开发了一种针对 MetS 的风险预测模型。该模型可以早期识别该人群中的高 MetS 风险,为个体健康管理和更广泛的社会经济健康计划提供重要益处。