Suppr超能文献

基于威斯康星州诊断乳腺癌(WDBC)数据集的特征选择改进蝙蝠算法

Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset.

作者信息

Jeyasingh Suganthi, Veluchamy Malathi

机构信息

Department of Computer Science and Engineering, Raja College of Engineering and Technology, Madurai, Tamilnadu, India. Email:

出版信息

Asian Pac J Cancer Prev. 2017 May 1;18(5):1257-1264. doi: 10.22034/APJCP.2017.18.5.1257.

Abstract

Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE).

摘要

乳腺癌的早期诊断对于挽救患者生命至关重要。通常,医学数据集包含各种各样的数据,这可能会在诊断过程中导致混淆。数据库知识发现(KDD)过程有助于提高效率。在最终诊断之前,需要从数据集中消除不适当和重复的数据。这可以使用数据挖掘中可用的任何特征选择算法来完成。特征选择被视为提高分类准确性的关键步骤。本文提出了一种改进的蝙蝠算法(MBA)用于特征选择,以从原始数据集中消除无关特征。通过简单随机抽样对蝙蝠算法进行修改,以从数据集中选择随机实例。通过全局最佳特征进行排序,以识别数据集中可用的主要特征。所选特征用于训练随机森林(RF)分类算法。MBA特征选择算法提高了RF在识别乳腺癌发生方面的分类准确性。使用威斯康星诊断乳腺癌数据集(WDBC)来估计所提出的MBA特征选择算法的性能分析。所提出的算法在卡帕统计量、马修斯相关系数、精度、F值、召回率、平均绝对误差(MAE)、均方根误差(RMSE)、相对绝对误差(RAE)和根相对平方误差(RRSE)方面取得了更好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00d2/5555532/d73e70a17808/APJCP-18-1257-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验