基于威斯康星州诊断乳腺癌（WDBC）数据集的特征选择改进蝙蝠算法

Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset.

作者信息

Jeyasingh Suganthi, Veluchamy Malathi

机构信息

Department of Computer Science and Engineering, Raja College of Engineering and Technology, Madurai, Tamilnadu, India. Email:

出版信息

Asian Pac J Cancer Prev. 2017 May 1;18(5):1257-1264. doi: 10.22034/APJCP.2017.18.5.1257.

DOI:10.22034/APJCP.2017.18.5.1257

PMID:28610411

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5555532/

Abstract

Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE).

摘要

乳腺癌的早期诊断对于挽救患者生命至关重要。通常，医学数据集包含各种各样的数据，这可能会在诊断过程中导致混淆。数据库知识发现（KDD）过程有助于提高效率。在最终诊断之前，需要从数据集中消除不适当和重复的数据。这可以使用数据挖掘中可用的任何特征选择算法来完成。特征选择被视为提高分类准确性的关键步骤。本文提出了一种改进的蝙蝠算法（MBA）用于特征选择，以从原始数据集中消除无关特征。通过简单随机抽样对蝙蝠算法进行修改，以从数据集中选择随机实例。通过全局最佳特征进行排序，以识别数据集中可用的主要特征。所选特征用于训练随机森林（RF）分类算法。MBA特征选择算法提高了RF在识别乳腺癌发生方面的分类准确性。使用威斯康星诊断乳腺癌数据集（WDBC）来估计所提出的MBA特征选择算法的性能分析。所提出的算法在卡帕统计量、马修斯相关系数、精度、F值、召回率、平均绝对误差（MAE）、均方根误差（RMSE）、相对绝对误差（RAE）和根相对平方误差（RRSE）方面取得了更好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00d2/5555532/d73e70a17808/APJCP-18-1257-g001.jpg

相似文献

Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset.

Asian Pac J Cancer Prev. 2017 May 1;18(5):1257-1264. doi: 10.22034/APJCP.2017.18.5.1257.

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets.

Iran J Basic Med Sci. 2016 May;19(5):476-82.

Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network.

Comput Math Methods Med. 2019 Sep 23;2019:7398307. doi: 10.1155/2019/7398307. eCollection 2019.

A structured combination of ensemble classifier and filter-based feature selection to improve breast cancer diagnosis.

J Cancer Res Clin Oncol. 2023 Nov;149(16):14519-14534. doi: 10.1007/s00432-023-05238-4. Epub 2023 Aug 12.

Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.

Biomed Mater Eng. 2015;26 Suppl 1:S975-81. doi: 10.3233/BME-151392.

Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner.

Comput Math Methods Med. 2021 May 17;2021:6662420. doi: 10.1155/2021/6662420. eCollection 2021.

Breast cancer diagnosis using the fast learning network algorithm.

Front Oncol. 2023 Apr 27;13:1150840. doi: 10.3389/fonc.2023.1150840. eCollection 2023.

Medical data mining by fuzzy modeling with selected features.

Artif Intell Med. 2008 Jul;43(3):195-206. doi: 10.1016/j.artmed.2008.04.004. Epub 2008 Jun 5.

Enhanced cancer recognition system based on random forests feature elimination algorithm.

J Med Syst. 2012 Aug;36(4):2577-85. doi: 10.1007/s10916-011-9730-1. Epub 2011 May 13.

An enhanced and efficient approach for feature selection for chronic human disease prediction: A breast cancer study.

Heliyon. 2024 Feb 28;10(5):e26799. doi: 10.1016/j.heliyon.2024.e26799. eCollection 2024 Mar 15.

引用本文的文献

A comparative evaluation of nature-inspired algorithms for feature selection problems.

Heliyon. 2023 Dec 12;10(1):e23571. doi: 10.1016/j.heliyon.2023.e23571. eCollection 2024 Jan 15.

Machine Learning Approach for Metabolic Syndrome Diagnosis Using Explainable Data-Augmentation-Based Classification.

Diagnostics (Basel). 2022 Dec 10;12(12):3117. doi: 10.3390/diagnostics12123117.

Multiclass feature selection with metaheuristic optimization algorithms: a review.

Neural Comput Appl. 2022;34(22):19751-19790. doi: 10.1007/s00521-022-07705-4. Epub 2022 Aug 30.

Recent advances of bat-inspired algorithm, its versions and applications.

Neural Comput Appl. 2022;34(19):16387-16422. doi: 10.1007/s00521-022-07662-y. Epub 2022 Aug 11.

Meta-Heuristic Algorithm-Tuned Neural Network for Breast Cancer Diagnosis Using Ultrasound Images.

Front Oncol. 2022 Jun 13;12:834028. doi: 10.3389/fonc.2022.834028. eCollection 2022.

本文引用的文献

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets.

Iran J Basic Med Sci. 2016 May;19(5):476-82.

Breast Cancer Detection with Reduced Feature Set.

Comput Math Methods Med. 2015;2015:265138. doi: 10.1155/2015/265138. Epub 2015 May 19.

Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography.

Artif Intell Med. 2015 Jan;63(1):19-31. doi: 10.1016/j.artmed.2014.12.004. Epub 2014 Dec 12.

Classification of lung cancer using ensemble-based feature selection and machine learning methods.

Mol Biosyst. 2015 Mar;11(3):791-800. doi: 10.1039/c4mb00659c. Epub 2014 Dec 16.

Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review.

Clin Imaging. 2013 May-Jun;37(3):420-6. doi: 10.1016/j.clinimag.2012.09.024. Epub 2012 Nov 13.

Medical data mining by fuzzy modeling with selected features.

Artif Intell Med. 2008 Jul;43(3):195-206. doi: 10.1016/j.artmed.2008.04.004. Epub 2008 Jun 5.

Wavelet transforms for detecting microcalcifications in mammograms.

IEEE Trans Med Imaging. 1996;15(2):218-29. doi: 10.1109/42.491423.

An adaptive density-weighted contrast enhancement filter for mammographic breast mass detection.

IEEE Trans Med Imaging. 1996;15(1):59-67. doi: 10.1109/42.481441.

A review of feature selection techniques in bioinformatics.

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors.

Radiology. 2007 Aug;244(2):390-8. doi: 10.1148/radiol.2442060712. Epub 2007 Jun 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于威斯康星州诊断乳腺癌（WDBC）数据集的特征选择改进蝙蝠算法

Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献