Mahajan Palak, Uddin Shahadat, Hajati Farshid, Moni Mohammad Ali
College of Engineering and Science, Victoria University, Sydney, NSW 2000, Australia.
School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW 2037, Australia.
Healthcare (Basel). 2023 Jun 20;11(12):1808. doi: 10.3390/healthcare11121808.
Machine learning models are used to create and enhance various disease prediction frameworks. Ensemble learning is a machine learning technique that combines multiple classifiers to improve performance by making more accurate predictions than a single classifier. Although numerous studies have employed ensemble approaches for disease prediction, there is a lack of thorough assessment of commonly used ensemble approaches against highly researched diseases. Consequently, this study aims to identify significant trends in the performance accuracies of ensemble techniques (i.e., bagging, boosting, stacking, and voting) against five hugely researched diseases (i.e., diabetes, skin disease, kidney disease, liver disease, and heart conditions). Using a well-defined search strategy, we first identified 45 articles from the current literature that applied two or more of the four ensemble approaches to any of these five diseases and were published in 2016-2023. Although stacking has been used the fewest number of times (23) compared with bagging (41) and boosting (37), it showed the most accurate performance the most times (19 out of 23). The voting approach is the second-best ensemble approach, as revealed in this review. Stacking always revealed the most accurate performance in the reviewed articles for skin disease and diabetes. Bagging demonstrated the best performance for kidney disease (five out of six times) and boosting for liver and diabetes (four out of six times). The results show that stacking has demonstrated greater accuracy in disease prediction than the other three candidate algorithms. Our study also demonstrates variability in the perceived performance of different ensemble approaches against frequently used disease datasets. The findings of this work will assist researchers in better understanding current trends and hotspots in disease prediction models that employ ensemble learning, as well as in determining a more suitable ensemble model for predictive disease analytics. This article also discusses variability in the perceived performance of different ensemble approaches against frequently used disease datasets.
机器学习模型用于创建和增强各种疾病预测框架。集成学习是一种机器学习技术,它通过组合多个分类器来提高性能,从而做出比单个分类器更准确的预测。尽管众多研究已采用集成方法进行疾病预测,但对于针对深入研究的疾病的常用集成方法缺乏全面评估。因此,本研究旨在确定针对五种深入研究的疾病(即糖尿病、皮肤病、肾病、肝病和心脏病)的集成技术(即装袋法、提升法、堆叠法和投票法)在性能准确性方面的显著趋势。通过使用明确的搜索策略,我们首先从当前文献中识别出45篇文章,这些文章在2016年至2023年期间将四种集成方法中的两种或更多种应用于这五种疾病中的任何一种。与装袋法(41次)和提升法(37次)相比,堆叠法使用次数最少(23次),但其在大多数情况下(23次中的19次)表现出最准确的性能。本综述表明,投票法是第二好的集成方法。在综述文章中,堆叠法在皮肤病和糖尿病方面始终表现出最准确的性能。装袋法在肾病方面表现最佳(六次中有五次),提升法在肝病和糖尿病方面表现最佳(六次中有四次)。结果表明,堆叠法在疾病预测中比其他三种候选算法具有更高的准确性。我们的研究还表明,针对常用疾病数据集,不同集成方法的感知性能存在差异。这项工作的结果将帮助研究人员更好地理解采用集成学习的疾病预测模型的当前趋势和热点,以及确定更适合预测性疾病分析的集成模型。本文还讨论了针对常用疾病数据集,不同集成方法的感知性能差异。