M S Karthika, Rajaguru Harikumar, Nair Ajin R
Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam, India.
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, India.
Heliyon. 2024 Aug 17;10(16):e36419. doi: 10.1016/j.heliyon.2024.e36419. eCollection 2024 Aug 30.
Gene expression in the microarray is assimilated with redundant and high-dimensional information. Moreover, the information in the microarray genes mostly correlates with background noise. This paper uses dimensionality reduction and feature selection methods to employ a classification methodology for high-dimensional lung cancer microarray data. The approach is enforced in two phases; initially, the genes are dimensionally reduced through Hilbert Transform, Detrend Fluctuation Analysis and Least Square Linear Regression methods. The dimensionally reduced data is further optimized in the next phase using Elephant Herd optimization (EHO) and Cuckoo Search Feature selection methods. The classifiers used here are Bayesian Linear Discriminant, Naive Bayes, Random Forest, Decision Tree, SVM (Linear), SVM (Polynomial), and SVM (RBF). The classifier's performances are analysed with and without feature selection methods. The SVM (Linear) classifier with the DFA Dimensionality Reduction method and EHO feature selection achieved the highest accuracy of 92.26 % compared to other classifiers.
微阵列中的基因表达包含冗余和高维信息。此外,微阵列基因中的信息大多与背景噪声相关。本文采用降维和特征选择方法,对高维肺癌微阵列数据采用分类方法。该方法分两个阶段实施;首先,通过希尔伯特变换、去趋势波动分析和最小二乘线性回归方法对基因进行降维。在下一阶段,使用象群优化(EHO)和布谷鸟搜索特征选择方法对降维后的数据进一步优化。这里使用的分类器有贝叶斯线性判别、朴素贝叶斯、随机森林、决策树、支持向量机(线性)、支持向量机(多项式)和支持向量机(径向基函数)。分析了有无特征选择方法时分类器的性能。与其他分类器相比,采用DFA降维方法和EHO特征选择的支持向量机(线性)分类器达到了92.26%的最高准确率。