Panda Pinakshi, Bisoy Sukant Kishoro, Kautish Sandeep, Ahmad Reyaz, Irshad Asma, Sarwar Nadeem
Department of Computer Science & Engineering, C. V. Raman Global University, Bidyanagar, Mahura, Janla 752054, Bhubaneswar, Odisha, India.
Apex Institute of Technology, Chandigarh University, Mohali, Punjab, India.
Int J Telemed Appl. 2024 Oct 17;2024:4105224. doi: 10.1155/2024/4105224. eCollection 2024.
Cancer is the top cause of death worldwide, and machine learning (ML) has made an indelible mark on the field of early cancer detection, thereby lowering the death toll. ML-based model for cancer diagnosis is done using two forms of data: gene expression data and microarray data. The data on gene expression levels includes many dimensions. When dealing with data with a high dimension, the efficiency of an ML-based model is decreased. Microarray data is distinguished by its high dimensionality with a greater number of features and a smaller sample size. In this work, two ensemble techniques are proposed using majority voting technique and weighted average technique. Correlation feature selection (CFS) is used for feature selection, and improved grey wolf optimizer (IGWO) is used for feature optimization. Support vector machines (SVMs), multilayer perceptron (MLP) classification, logistic regression (LR), decision tree (DT), adaptive boosting (AdaBoost) classifier, extreme learning machines (ELMs), and K-nearest neighbor (KNN) are used as classifiers. The results of each distinct base learner were then combined using weighted average and majority voting ensemble methods. Accuracy (ACC), specificity (SPE), sensitivity (SEN), precision (PRE), Matthews correlation coefficient (MCC), and F1-score (F1-S) are used to assess the performance. Our result shows that majority voting achieves better performance than the weighted average ensemble technique.
癌症是全球首要死因,而机器学习(ML)在早期癌症检测领域留下了不可磨灭的印记,从而降低了死亡率。基于ML的癌症诊断模型使用两种数据形式:基因表达数据和微阵列数据。基因表达水平的数据包含多个维度。在处理高维数据时,基于ML的模型效率会降低。微阵列数据的特点是维度高、特征数量多且样本量小。在这项工作中,提出了两种集成技术,即多数投票技术和加权平均技术。相关特征选择(CFS)用于特征选择,改进的灰狼优化器(IGWO)用于特征优化。支持向量机(SVM)、多层感知器(MLP)分类、逻辑回归(LR)、决策树(DT)、自适应提升(AdaBoost)分类器、极限学习机(ELM)和K近邻(KNN)被用作分类器。然后使用加权平均和多数投票集成方法对每个不同的基础学习器的结果进行合并。使用准确率(ACC)、特异性(SPE)、灵敏度(SEN)、精确率(PRE)、马修斯相关系数(MCC)和F1分数(F1-S)来评估性能。我们的结果表明,多数投票比加权平均集成技术具有更好的性能。