Al Jowf Ghazi I, Kolhar Manjur
Department Public Health, College of Applied Medical Sciences, King Faisal University, Al Hofuf, 37912, Al Ahsa, Saudi Arabia.
Department Health Information Management and Technology, College of Applied Medical Sciences, King Faisal University, Al Hofuf, 37912, Al Ahsa, Saudi Arabia.
Sci Rep. 2025 Jul 2;15(1):23455. doi: 10.1038/s41598-025-07874-x.
This research emphasizes the role of analytics in evaluating the risk of disease (CVD) focusing on thorough data preparation and feature engineering for accurate predictions. We studied machine learning (ML) and learning (DL) models, such as Logistic Regression (LR) Random Forest (RF) Gradient Boosting Machines (GBM) and Multilayer Perceptron (MLP). Each model's performance was assessed using metrics like accuracy, precision, recall, F1 score and ROC AUC to determine their reliability and practical relevance. Our analysis shows the strengths of each model category. Conventional ML models like Random Forest and Gradient Boosting Machines were effective in identifying patients at risk achieving up to 74% accuracy and 72% recall. On the hand, deep learning models like Multilayer Perceptron excelled in handling data with an impressive ROC AUC score of approximately 80%. Despite the need for resources and extensive data preprocessing these models are highly skilled at pinpointing crucial risk factors, crucial, for long term CVD management.
本研究强调了分析在评估疾病(心血管疾病)风险中的作用,重点在于进行全面的数据准备和特征工程以实现准确预测。我们研究了机器学习(ML)和深度学习(DL)模型,如逻辑回归(LR)、随机森林(RF)、梯度提升机(GBM)和多层感知器(MLP)。使用准确率、精确率、召回率、F1分数和ROC曲线下面积等指标评估每个模型的性能,以确定其可靠性和实际相关性。我们的分析展示了每个模型类别的优势。像随机森林和梯度提升机这样的传统机器学习模型在识别有风险的患者方面很有效,准确率高达74%,召回率达72%。另一方面,像多层感知器这样的深度学习模型在处理数据方面表现出色,ROC曲线下面积得分约为80%,令人印象深刻。尽管需要资源和大量数据预处理,但这些模型在精准识别关键风险因素方面非常擅长,而这些因素对于长期心血管疾病管理至关重要。