Gunavathi C, Premalatha K
Int J Data Min Bioinform. 2015;13(3):248-65. doi: 10.1504/ijdmb.2015.072092.
Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.
布谷鸟搜索(CS)优化算法用于利用微阵列基因表达数据进行癌症分类中的特征选择。由于基因表达数据包含数千个基因且样本数量较少,特征选择方法可用于选择信息丰富的基因以提高分类准确率。最初,基因根据T统计量、信噪比(SNR)和F统计量值进行排序。CS用于从排名靠前的m个基因中找到信息丰富的基因。k近邻(kNN)技术的分类准确率用作CS的适应度函数。该方法在十个不同的癌症基因表达数据集上进行了实验和分析。结果表明,CS在弥漫性大B细胞淋巴瘤哈佛数据集、肺癌密歇根数据集、卵巢癌数据集、急性髓细胞白血病-急性淋巴细胞白血病数据集和肺癌哈佛2数据集上的平均准确率达到100%,并且在弥漫性大B细胞淋巴瘤结果数据集和前列腺数据集上优于现有技术。