Suppr超能文献

综述乳腺癌中的集成分类方法。

Reviewing ensemble classification methods in breast cancer.

机构信息

Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.

Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.

出版信息

Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20.

Abstract

CONTEXT

Ensemble methods consist of combining more than one single technique to solve the same task. This approach was designed to overcome the weaknesses of single techniques and consolidate their strengths. Ensemble methods are now widely used to carry out prediction tasks (e.g. classification and regression) in several fields, including that of bioinformatics. Researchers have particularly begun to employ ensemble techniques to improve research into breast cancer, as this is the most frequent type of cancer and accounts for most of the deaths among women.

OBJECTIVE AND METHOD

The goal of this study is to analyse the state of the art in ensemble classification methods when applied to breast cancer as regards 9 aspects: publication venues, medical tasks tackled, empirical and research types adopted, types of ensembles proposed, single techniques used to construct the ensembles, validation framework adopted to evaluate the proposed ensembles, tools used to build the ensembles, and optimization methods used for the single techniques. This paper was undertaken as a systematic mapping study.

RESULTS

A total of 193 papers that were published from the year 2000 onwards, were selected from four online databases: IEEE Xplore, ACM digital library, Scopus and PubMed. This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies. The homogeneous type was that most widely used to perform the classification task. With regard to single techniques, this mapping study found that decision trees, support vector machines and artificial neural networks were those most frequently adopted to build ensemble classifiers. In the case of the evaluation framework, the Wisconsin Breast Cancer dataset was the most frequently used by researchers to perform their experiments, while the most noticeable validation method was k-fold cross-validation. Several tools are available to perform experiments related to ensemble classification methods, such as Weka and R Software. Few researchers took into account the optimisation of the single technique of which their proposed ensemble was composed, while the grid search method was that most frequently adopted to tune the parameter settings of a single classifier.

CONCLUSION

This paper reports an in-depth study of the application of ensemble methods as regards breast cancer. Our results show that there are several gaps and issues and we, therefore, provide researchers in the field of breast cancer research with recommendations. Moreover, after analysing the papers found in this systematic mapping study, we discovered that the majority report positive results concerning the accuracy of ensemble classifiers when compared to the single classifiers. In order to aggregate the evidence reported in literature, it will, therefore, be necessary to perform a systematic literature review and meta-analysis in which an in-depth analysis could be conducted so as to confirm the superiority of ensemble classifiers over the classical techniques.

摘要

背景

集成方法由组合多种单一技术来解决同一任务组成。这种方法旨在克服单一技术的弱点,并巩固其优势。集成方法现已广泛用于包括生物信息学在内的多个领域进行预测任务(例如分类和回归)。研究人员特别开始使用集成技术来改进乳腺癌研究,因为乳腺癌是最常见的癌症类型,也是女性死亡的主要原因。

目的和方法

本研究的目的是分析在涉及 9 个方面的乳腺癌应用中,集成分类方法的最新状态:出版物来源、解决的医学任务、采用的经验和研究类型、提出的集成类型、用于构建集成的单一技术、采用的验证框架来评估提出的集成、用于构建集成的工具以及用于单一技术的优化方法。本文是作为系统映射研究进行的。

结果

从四个在线数据库:IEEE Xplore、ACM 数字图书馆、Scopus 和 PubMed 中选择了 193 篇自 2000 年以来发表的论文。本研究发现,在存在的六项医学任务中,诊断医学任务是研究最多的任务,基于实验的经验型和基于评估的研究型是选定研究中采用的最主要方法。同质型是最常用于执行分类任务的方法。关于单一技术,该映射研究发现,决策树、支持向量机和人工神经网络是构建集成分类器最常用的技术。在评估框架方面,研究人员最常使用威斯康星州乳腺癌数据集来进行实验,而最引人注目的验证方法是 k 折交叉验证。有几个工具可用于执行与集成分类方法相关的实验,例如 Weka 和 R 软件。很少有研究人员考虑到他们提出的集成所组成的单一技术的优化,而网格搜索方法是最常用于调整单个分类器参数设置的方法。

结论

本文报告了对集成方法在乳腺癌方面应用的深入研究。我们的结果表明存在一些差距和问题,因此为乳腺癌研究领域的研究人员提供了建议。此外,在分析了这项系统映射研究中发现的论文后,我们发现大多数论文报告了集成分类器的准确性与单一分类器相比的阳性结果。为了汇总文献中报告的证据,有必要进行系统的文献综述和荟萃分析,以便对集成分类器相对于经典技术的优越性进行深入分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验