Department of Information Technology, Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamilnadu, India.
Curr Med Imaging. 2020;16(3):249-261. doi: 10.2174/1573405614666180720152838.
Data mining algorithms are extensively used to classify the data, in which prediction of disease using minimal computation time plays a vital role.
The aim of this paper is to develop the classification model from reduced features and instances.
In this paper we proposed four search algorithms for feature selection the first algorithm is Random Global Optimal (RGO) search algorithm for searching the continuous, global optimal subset of features from the random population. The second is Global and Local Optimal (GLO) search algorithm for searching the global and local optimal subset of features from population. The third one is Random Local Optimal (RLO) search algorithm for generating random, local optimal subset of features from the random population. Finally the Random Global and Optimal (RGLO) search algorithm for searching the continuous, global and local optimal subset of features from the random population. RGLO search algorithm combines the properties of first three stated algorithm. The subsets of features generated from the proposed four search algorithms are evaluated using the consistency based subset evaluation measure. Instance based learning algorithm is applied to the resulting feature dataset to reduce the instances that are redundant or irrelevant for classification. The model developed using naïve Bayesian classifier from the reduced features and instances is validated with the tenfold cross validation.
Classification accuracy based on RGLO search algorithm using naïve Bayesian classifier is 94.82% for Breast, 97.4% for DLBCL, 98.83% for SRBCT and 98.89% for Leukemia datasets.
The RGLO search based reduced features results in the high prediction rate with less computational time when compared with the complete dataset and other proposed subset generation algorithm.
数据挖掘算法被广泛用于对数据进行分类,其中使用最小计算时间预测疾病起着至关重要的作用。
本文的目的是从减少的特征和实例中开发分类模型。
在本文中,我们提出了四种特征选择搜索算法,第一种算法是随机全局最优(RGO)搜索算法,用于从随机种群中搜索连续的、全局最优的特征子集。第二种是全局和局部最优(GLO)搜索算法,用于从种群中搜索全局和局部最优的特征子集。第三种是随机局部最优(RLO)搜索算法,用于从随机种群中生成随机的、局部最优的特征子集。最后是随机全局和最优(RGLO)搜索算法,用于从随机种群中搜索连续的、全局和局部最优的特征子集。RGLO 搜索算法结合了前三种算法的特性。使用基于一致性的子集评估度量对从提出的四种搜索算法生成的特征子集进行评估。实例基学习算法应用于生成的特征数据集,以减少对分类冗余或无关的实例。从减少的特征和实例中使用朴素贝叶斯分类器开发的模型通过十折交叉验证进行验证。
基于 RGLO 搜索算法和朴素贝叶斯分类器的分类准确率分别为乳腺癌数据集 94.82%、DLBCL 数据集 97.4%、SRBCT 数据集 98.83%和白血病数据集 98.89%。
与完整数据集和其他提出的子集生成算法相比,基于 RGLO 搜索的减少特征的结果具有较高的预测率和较少的计算时间。