Department of Artificial Intelligence, Korea University, Anam-dong Seongbuk-gu, Seoul 02841, Republic of Korea.
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India.
J Healthc Eng. 2021 Jul 29;2021:6680424. doi: 10.1155/2021/6680424. eCollection 2021.
In the field of bioinformatics, feature selection in classification of cancer is a primary area of research and utilized to select the most informative genes from thousands of genes in the microarray. Microarray data is generally noisy, is highly redundant, and has an extremely asymmetric dimensionality, as the majority of the genes present here are believed to be uninformative. The paper adopts a methodology of classification of high dimensional lung cancer microarray data utilizing feature selection and optimization techniques. The methodology is divided into two stages; firstly, the ranking of each gene is done based on the standard gene selection techniques like Information Gain, Relief-F test, Chi-square statistic, and -statistic test. As a result, the gathering of top scored genes is assimilated, and a new feature subset is obtained. In the second stage, the new feature subset is further optimized by using swarm intelligence techniques like Grasshopper Optimization (GO), Moth Flame Optimization (MFO), Bacterial Foraging Optimization (BFO), Krill Herd Optimization (KHO), and Artificial Fish Swarm Optimization (AFSO), and finally, an optimized subset is utilized. The selected genes are used for classification, and the classifiers used here are Naïve Bayesian Classifier (NBC), Decision Trees (DT), Support Vector Machines (SVM), and -Nearest Neighbour (KNN). The best results are shown when Relief-F test is computed with AFSO and classified with Decision Trees classifier for hundred genes, and the highest classification accuracy of 99.10% is obtained.
在生物信息学领域,癌症分类中的特征选择是一个主要的研究领域,并被用于从微阵列中的数千个基因中选择最具信息量的基因。微阵列数据通常是嘈杂的、高度冗余的,并且具有极高的不对称维度,因为这里的大多数基因被认为是没有信息量的。本文采用了一种利用特征选择和优化技术对高维肺癌微阵列数据进行分类的方法。该方法分为两个阶段;首先,根据信息增益、 Relief-F 检验、卡方统计和 t 检验等标准基因选择技术对每个基因进行排名。结果是收集得分最高的基因,并获得一个新的特征子集。在第二阶段,通过使用群体智能技术,如 Grasshopper Optimization (GO)、Moth Flame Optimization (MFO)、Bacterial Foraging Optimization (BFO)、Krill Herd Optimization (KHO)和 Artificial Fish Swarm Optimization (AFSO),对新的特征子集进行进一步优化,最后得到一个优化的子集。选择的基因用于分类,这里使用的分类器是 Naive Bayesian Classifier (NBC)、Decision Trees (DT)、Support Vector Machines (SVM)和 -Nearest Neighbour (KNN)。当使用 Relief-F 检验与 AFSO 进行计算,并使用决策树分类器对 100 个基因进行分类时,结果显示出最佳效果,获得了 99.10%的最高分类准确性。