Suppr超能文献

基于决策树的集成学习在乳腺癌分类中的应用。

Application of decision tree-based ensemble learning in the classification of breast cancer.

作者信息

Ghiasi Mohammad M, Zendehboudi Sohrab

机构信息

Faculty of Engineering and Applied Science, Memorial University, St. John's, NL A1B 3X5, Canada.

出版信息

Comput Biol Med. 2021 Jan;128:104089. doi: 10.1016/j.compbiomed.2020.104089. Epub 2020 Oct 31.

Abstract

As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.

摘要

作为一种常见的筛查和诊断工具,对可疑乳腺肿块进行细针穿刺活检(FNAB)可用于区分乳腺细胞学的恶性和良性情况。在本研究中,我们首先回顾了已发表的关于乳腺癌分类的著作,其中通过使用威斯康星乳腺癌数据库(WBCD)应用了机器学习和数据挖掘算法。这项工作随后引入了基于随机森林(RF)和极端随机树或Extra Trees(ET)算法的有用新工具来对乳腺癌进行分类。RF和ET策略使用决策树作为合适的分类器来实现最终分类。RF和ET方法包括四个主要阶段:输入识别、确定最佳树数量、投票分析和最终决策。本研究中实现的模型将细胞大小均匀性、平淡染色质、有丝分裂和团块厚度等重要因素作为输入参数。根据统计分析,所提出的方法能够准确地对乳腺癌类型进行分类。误差分析结果表明,与文献中用于WBCD分类的先前工具/模型相比,所设计的RF和ET模型提供了易于使用的结果和最高的诊断性能。在这些因素中,相对重要性的最高和最低程度分别归因于细胞大小均匀性和有丝分裂。预计RF和ET算法在不久的将来将在医学和卫生系统的筛查和诊断中发挥重要作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验