Department of Statistics, University of Gujrat, Gujrat 50700, Pakistan.
Department of Internal Medicine, College of Medicine, Majmaah University, Almajmaah 11952, Saudi Arabia.
Int J Environ Res Public Health. 2021 Nov 29;18(23):12586. doi: 10.3390/ijerph182312586.
Criticism of the implementation of existing risk prediction models (RPMs) for cardiovascular diseases (CVDs) in new populations motivates researchers to develop regional models. The predominant usage of laboratory features in these RPMs is also causing reproducibility issues in low-middle-income countries (LMICs). Further, conventional logistic regression analysis (LRA) does not consider non-linear associations and interaction terms in developing these RPMs, which might oversimplify the phenomenon. This study aims to develop alternative machine learning (ML)-based RPMs that may perform better at predicting CVD status using nonlaboratory features in comparison to conventional RPMs. The data was based on a case-control study conducted at the Punjab Institute of Cardiology, Pakistan. Data from 460 subjects, aged between 30 and 76 years, with (1:1) gender-based matching, was collected. We tested various ML models to identify the best model/models considering LRA as a baseline RPM. An artificial neural network and a linear support vector machine outperformed the conventional RPM in the majority of performance matrices. The predictive accuracies of the best performed ML-based RPMs were between 80.86 and 81.09% and were found to be higher than 79.56% for the baseline RPM. The discriminating capabilities of the ML-based RPMs were also comparable to baseline RPMs. Further, ML-based RPMs identified substantially different orders of features as compared to baseline RPM. This study concludes that nonlaboratory feature-based RPMs can be a good choice for early risk assessment of CVDs in LMICs. ML-based RPMs can identify better order of features as compared to the conventional approach, which subsequently provided models with improved prognostic capabilities.
现有心血管疾病(CVD)风险预测模型(RPM)在新人群中的实施受到批评,这促使研究人员开发区域性模型。这些 RPM 中主要使用实验室特征也导致中低收入国家(LMICs)的可重复性问题。此外,在开发这些 RPM 时,传统的逻辑回归分析(LRA)没有考虑非线性关联和交互项,这可能会使现象过于简化。本研究旨在开发替代的机器学习(ML)为基础的 RPM,与传统的 RPM 相比,使用非实验室特征可能会更好地预测 CVD 状况。该数据基于在巴基斯坦旁遮普心脏病学研究所进行的病例对照研究。收集了年龄在 30 至 76 岁之间、性别匹配(1:1)的 460 名受试者的数据。我们测试了各种 ML 模型,以确定在考虑 LRA 作为基线 RPM 的情况下,哪种模型表现最佳。人工神经网络和线性支持向量机在大多数性能矩阵中的表现均优于传统 RPM。表现最佳的基于 ML 的 RPM 的预测准确率在 80.86%至 81.09%之间,高于基线 RPM 的 79.56%。基于 ML 的 RPM 的判别能力也与基线 RPM 相当。此外,与基线 RPM 相比,基于 ML 的 RPM 确定的特征顺序有很大不同。本研究得出结论,基于非实验室特征的 RPM 是评估 LMICs 中 CVD 早期风险的一个不错的选择。与传统方法相比,基于 ML 的 RPM 可以识别更好的特征顺序,从而为模型提供了改进的预后能力。