Ashika T, Hannah Grace G
Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology Chennai, Chennai, India.
Front Digit Health. 2025 Jun 19;7:1609308. doi: 10.3389/fdgth.2025.1609308. eCollection 2025.
Cardiovascular disease (CVD) is a leading global cause of death, necessitating the development of accurate diagnostic models. This study presents an Optimized Rough Set Theory-Machine Learning (RST-ML) framework that integrates Multi-Criteria Decision-Making (MCDM) for effective heart disease (HD) prediction. By utilizing RST for feature selection, the framework minimizes dimensionality while retaining essential information.
The framework employs RST to select relevant features, followed by the integration of nine ML classifiers into five stacked ensemble models through correlation analysis to enhance predictive accuracy and reduce overfitting. The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) ranks the models, with weights assigned using the Mean Rank Error Correction (MEREC) method. Hyperparameter tuning for the top model, Stack-4, was conducted using GridSearchCV, identifying XGBoost (XG) as the most effective classifier. To assess scalability and generalization, the framework was evaluated using additional datasets, including chronic kidney disease (CKD), obesity levels, and breast cancer. Explainable AI (XAI) techniques were also applied to clarify feature importance and decision-making processes.
Stack-4 emerged as the highest-performing model, with XGBoost achieving the best predictive accuracy. The application of XAI techniques provided insights into the model's decision-making, highlighting key features influencing predictions.
The findings demonstrate the effectiveness of the RST-ML framework in improving HD prediction accuracy. The successful application to diverse datasets indicates strong scalability and generalization potential, making the framework a robust and scalable solution for timely diagnosis across various health conditions.
心血管疾病(CVD)是全球主要的死亡原因,因此需要开发准确的诊断模型。本研究提出了一种优化的粗糙集理论-机器学习(RST-ML)框架,该框架集成了多准则决策(MCDM)以实现有效的心脏病(HD)预测。通过利用粗糙集理论进行特征选择,该框架在保留重要信息的同时最小化了维度。
该框架采用粗糙集理论选择相关特征,然后通过相关性分析将九个机器学习分类器集成到五个堆叠集成模型中,以提高预测准确性并减少过拟合。理想解相似排序法(TOPSIS)对模型进行排名,并使用平均排名误差校正(MEREC)方法分配权重。使用GridSearchCV对顶级模型Stack-4进行超参数调整,确定XGBoost(XG)为最有效的分类器。为了评估可扩展性和泛化能力,使用包括慢性肾脏病(CKD)、肥胖水平和乳腺癌在内的其他数据集对该框架进行了评估。还应用了可解释人工智能(XAI)技术来阐明特征重要性和决策过程。
Stack-4成为性能最高的模型,XGBoost实现了最佳预测准确性。XAI技术的应用为模型的决策提供了见解,突出了影响预测的关键特征。
研究结果证明了RST-ML框架在提高心脏病预测准确性方面的有效性。该框架在不同数据集上的成功应用表明其具有强大的可扩展性和泛化潜力,使其成为跨各种健康状况进行及时诊断的强大且可扩展的解决方案。