Mehta Sumet, Han Fei, Sohail Muhammad, Twala Bhekisipho, Ullah Asad, Ullah Fasee, Khan Arfat Ahmad, Ling Qinghua
School of Computer Science & Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China.
R&D, Star Engineers India Pvt. Ltd., Pune, Maharashtra, India.
PeerJ Comput Sci. 2025 May 28;11:e2872. doi: 10.7717/peerj-cs.2872. eCollection 2025.
The analysis of high-dimensional microarray gene expression data presents critical challenges, including excessive dimensionality, increased computational burden, and sensitivity to random initialization. Traditional optimization algorithms often produce inconsistent and suboptimal results, while failing to preserve local data structures limiting both predictive accuracy and biological interpretability. To address these limitations, this study proposes an adaptive neighborhood-preserving multi-objective particle swarm optimization (ANPMOPSO) framework for gene selection. ANPMOPSO introduces four key innovations: (1) a weighted neighborhood-preserving ensemble embedding (WNPEE) technique for dimensionality reduction that retains local structure; (2) Sobol sequence (SS) initialization to enhance population diversity and convergence stability; (3) a differential evolution (DE)-based adaptive velocity update to dynamically balance exploration and exploitation; and (4) a novel ranking strategy that combines Pareto dominance with neighborhood preservation quality to prioritize biologically meaningful gene subsets. Experimental evaluations on six benchmark microarray datasets and eleven multi-modal test functions (MMFs) demonstrate that ANPMOPSO consistently outperforms state-of-the-art methods. For example, it achieves 100% classification accuracy on Leukemia and Small-Round-Blue-Cell Tumor (SRBCT) using only 3-5 genes, improving accuracy by 5-15% over competitors while reducing gene subsets by 40-60%. Additionally, on MMFs, ANPMOPSO attains superior hypervolume values (., 1.0617 ± 0.2225 on MMF1, approximately 10-20% higher than competitors), confirming its robustness in balancing convergence and diversity. Although the method incurs higher training time due to its structural and adaptive components, it achieves a strong trade-off between computational cost and biological relevance, making it a promising tool for high-dimensional gene selection in bioinformatics.
高维微阵列基因表达数据的分析面临着诸多关键挑战,包括维度过高、计算负担增加以及对随机初始化的敏感性。传统优化算法往往会产生不一致且次优的结果,同时无法保留局部数据结构,这限制了预测准确性和生物学可解释性。为解决这些局限性,本研究提出了一种用于基因选择的自适应邻域保留多目标粒子群优化(ANPMOPSO)框架。ANPMOPSO引入了四项关键创新:(1)一种加权邻域保留集成嵌入(WNPEE)技术用于降维,该技术可保留局部结构;(2)索博尔序列(SS)初始化,以增强种群多样性和收敛稳定性;(3)基于差分进化(DE)的自适应速度更新,以动态平衡探索和利用;(4)一种新颖的排序策略,将帕累托优势与邻域保留质量相结合,以对具有生物学意义的基因子集进行优先级排序。对六个基准微阵列数据集和十一个多模态测试函数(MMF)的实验评估表明,ANPMOPSO始终优于现有方法。例如,它在白血病和小圆蓝细胞瘤(SRBCT)数据集上仅使用3 - 5个基因就实现了100%的分类准确率,比竞争对手的准确率提高了5 - 15%,同时基因子集减少了40 - 60%。此外,在MMF上,ANPMOPSO获得了更高的超体积值(例如,在MMF1上为1.0617±0.2225,比竞争对手高出约10 - 20%),证实了其在平衡收敛和多样性方面的稳健性。尽管该方法由于其结构和自适应组件导致训练时间较长,但它在计算成本和生物学相关性之间实现了强有力的权衡,使其成为生物信息学中高维基因选择的一个有前途的工具。