Alweshah Mohammed, Aldabbas Yasmeen, Abu-Salih Bilal, Oqeil Saleh, Hasan Hazem S, Alkhalaileh Saleh, Kassaymeh Sofian
Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan.
Department of Computer Science, King Abdullah II School of Information Technology, The University of Jordan, Amman, Jordan.
Heliyon. 2023 Sep 14;9(9):e20133. doi: 10.1016/j.heliyon.2023.e20133. eCollection 2023 Sep.
Gene Selection (GS) is a strategy method targeted at reducing redundancy, limited expressiveness, and low informativeness in gene expression datasets obtained by DNA Microarray technology. These datasets contain a plethora of diverse and high-dimensional samples and genes, with a significant discrepancy in the number of samples and genes present. The complexities of GS are especially noticeable in the context of microarray expression data analysis, owing to the inherent data imbalance. The main goal of this study is to offer a simplified and computationally effective approach to dealing with the conundrum of attribute selection in microarray gene expression data. We use the Black Widow Optimization algorithm (BWO) in the context of GS to achieve this, using two unique methodologies: the unaltered BWO variation and the hybridized BWO variant combined with the Iterated Greedy algorithm (BWO-IG). By improving the local search capabilities of BWO, this hybridization attempts to promote more efficient gene selection. A series of tests was carried out using nine benchmark datasets that were obtained from the gene expression data repository in the pursuit of empirical validation. The results of these tests conclusively show that the BWO-IG technique performs better than the traditional BWO algorithm. Notably, the hybridized BWO-IG technique excels in the efficiency of local searches, making it easier to identify relevant genes and producing findings with higher levels of reliability in terms of accuracy and the degree of gene pruning. Additionally, a comparison analysis is done against five modern wrapper Feature Selection (FS) methodologies, namely BIMFOHHO, BMFO, BHHO, BCS, and BBA, in order to put the suggested BWO-IG method's effectiveness into context. The comparison that follows highlights BWO-IG's obvious superiority in reducing the number of selected genes while also obtaining remarkably high classification accuracy. The key findings were an average classification accuracy of 94.426, average fitness values of 0.061, and an average number of selected genes of 2933.767.
基因选择(GS)是一种策略方法,旨在减少通过DNA微阵列技术获得的基因表达数据集中的冗余、有限的表达能力和低信息性。这些数据集包含大量多样的高维样本和基因,样本和基因的数量存在显著差异。由于固有的数据不平衡,基因选择的复杂性在微阵列表达数据分析的背景下尤为明显。本研究的主要目标是提供一种简化且计算高效的方法来处理微阵列基因表达数据中的属性选择难题。我们在基因选择的背景下使用黑寡妇优化算法(BWO)来实现这一目标,采用两种独特的方法:未改变的BWO变体以及与迭代贪婪算法相结合的杂交BWO变体(BWO-IG)。通过提高BWO的局部搜索能力,这种杂交试图促进更有效的基因选择。为了进行实证验证,使用从基因表达数据库获得的九个基准数据集进行了一系列测试。这些测试结果最终表明,BWO-IG技术的性能优于传统的BWO算法。值得注意的是,杂交的BWO-IG技术在局部搜索效率方面表现出色,使其更容易识别相关基因,并在准确性和基因剪枝程度方面产生具有更高可靠性的结果。此外,针对五种现代包装特征选择(FS)方法,即BIMFOHHO、BMFO、BHHO、BCS和BBA,进行了比较分析,以便将所提出的BWO-IG方法的有效性置于背景中。随后的比较突出了BWO-IG在减少所选基因数量的同时还能获得极高分类准确率方面的明显优势。关键结果是平均分类准确率为94.426,平均适应度值为0.061,平均所选基因数量为2933.767。