Kasim Sazzli, Amir Rudin Putri Nur Fatin, Malek Sorayya, Ibrahim Nurulain, Kiew Xue Ning, Nasir Nafiza Mat, Ibrahim Khairul Shafiq, Raja Shariff Raja Ezman
Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia.
Cardiac Vascular and Lung Research Institute, Universiti Teknologi MARA (UiTM), Shah Alam, Malaysia.
PLoS One. 2025 Jun 17;20(6):e0323949. doi: 10.1371/journal.pone.0323949. eCollection 2025.
Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.
This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.
Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).
The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.
These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.
心血管疾病(CVD)是包括马来西亚在内的西太平洋地区一项重大的公共卫生挑战。
本研究旨在开发并验证机器学习(ML)模型,以预测马来西亚队列中的10年心血管疾病风险,该模型可作为具有相似遗传和环境背景的其他亚洲人群的模型。
利用REDISCOVER注册研究的数据(2007年至2017年的5688名参与者),选择了30个临床相关特征,并训练了几种ML算法:支持向量机(SVM)、逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)、神经网络(NN)和朴素贝叶斯(NB)。还使用三种常用的元学习器创建了集成模型,包括随机森林、广义线性模型(GLM)和梯度提升模型(GBM)。数据集按70:30的训练-测试比例划分,并进行5折交叉验证以确保稳健性能。模型评估主要基于曲线下面积(AUC),并使用其他指标,如敏感性、特异性和净重新分类指数(NRI),将ML模型与传统风险评分(如弗雷明汉风险评分(FRS)和修订后的合并队列方程(RPCE))进行比较。
LR模型的AUC最高,为0.77,优于FRS(AUC = 0.72)和RPCE(AUC = 0.74)。集成模型表现稳健,尽管没有显著超过最佳的单个模型。SHAP(SHapley加性解释)分析确定了关键预测因素,如收缩压、体重和腰围。研究表明,与FRS相比,NRI显著提高了13.15%,与RPCE相比提高了7.00%,突出了ML方法在增强马来西亚心血管疾病风险预测方面的潜力。性能最佳的模型部署在网络平台上以供实时使用,以确保持续验证和临床适用性。
这些发现强调了ML模型在改善马来西亚及其他地区心血管疾病风险分层和决策方面的有效性。