Chellappan Dinesh, Rajaguru Harikumar
Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India.
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India.
Bioengineering (Basel). 2024 Jul 29;11(8):766. doi: 10.3390/bioengineering11080766.
This article investigates the effectiveness of feature extraction and selection techniques in enhancing the performance of classifier accuracy in Type II Diabetes Mellitus (DM) detection using microarray gene data. To address the inherent high dimensionality of the data, three feature extraction (FE) methods are used, namely Short-Time Fourier Transform (STFT), Ridge Regression (RR), and Pearson's Correlation Coefficient (PCC). To further refine the data, meta-heuristic algorithms like Bald Eagle Search Optimization (BESO) and Red Deer Optimization (RDO) are utilized for feature selection. The performance of seven classification techniques, Non-Linear Regression-NLR, Linear Regression-LR, Gaussian Mixture Models-GMMs, Expectation Maximization-EM, Logistic Regression-LoR, Softmax Discriminant Classifier-SDC, and Support Vector Machine with Radial Basis Function kernel-SVM-RBF, are evaluated with and without feature selection. The analysis reveals that the combination of PCC with SVM-RBF achieved a promising accuracy of 92.85% even without feature selection. Notably, employing BESO with PCC and SVM-RBF maintained this high accuracy. However, the highest overall accuracy of 97.14% was achieved when RDO was used for feature selection alongside PCC and SVM-RBF. These findings highlight the potential of feature extraction and selection techniques, particularly RDO with PCC, in improving the accuracy of DM detection using microarray gene data.
本文研究了特征提取和选择技术在利用微阵列基因数据提高II型糖尿病(DM)检测中分类器准确率性能方面的有效性。为了解决数据固有的高维度问题,使用了三种特征提取(FE)方法,即短时傅里叶变换(STFT)、岭回归(RR)和皮尔逊相关系数(PCC)。为了进一步优化数据,利用秃鹰搜索优化(BESO)和马鹿优化(RDO)等元启发式算法进行特征选择。评估了七种分类技术在有无特征选择情况下的性能,这七种分类技术分别是非线性回归-NLR、线性回归-LR、高斯混合模型-GMMs、期望最大化-EM、逻辑回归-LoR、Softmax判别分类器-SDC以及具有径向基函数核的支持向量机-SVM-RBF。分析表明,即使不进行特征选择,PCC与SVM-RBF的组合也实现了高达92.85%的准确率。值得注意的是,将BESO与PCC和SVM-RBF一起使用能保持这一高精度。然而,当将RDO与PCC和SVM-RBF一起用于特征选择时,总体准确率最高达到了97.14%。这些发现突出了特征提取和选择技术,特别是RDO与PCC相结合,在利用微阵列基因数据提高DM检测准确率方面的潜力。