Suppr超能文献

一种基于综合学习的群体优化方法用于基因表达数据中的特征选择

A comprehensive learning based swarm optimization approach for feature selection in gene expression data.

作者信息

Easwaran Subha, Venugopal Jothi Prakash, Subramanian Arul Antran Vijay, Sundaram Gopikrishnan, Naseeba Beebi

机构信息

Department of Science and Humanities, Karpagam College of Engineering, Myleripalayam Village, Coimbatore-641032, Tamilnadu, India.

Department of Information Technology, Karpagam College of Engineering, Myleripalayam Village, Coimbatore-641032, Tamilnadu, India.

出版信息

Heliyon. 2024 Sep 2;10(17):e37165. doi: 10.1016/j.heliyon.2024.e37165. eCollection 2024 Sep 15.

Abstract

Gene expression data analysis is challenging due to the high dimensionality and complexity of the data. Feature selection, which identifies relevant genes, is a common preprocessing step. We propose a Comprehensive Learning-Based Swarm Optimization (CLBSO) approach for feature selection in gene expression data. CLBSO leverages the strengths of ants and grasshoppers to efficiently explore the high-dimensional search space. Ants perform local search and leave pheromone trails to guide the swarm, while grasshoppers use their ability to jump long distances to explore new regions and avoid local optima. The proposed approach was evaluated on several publicly available gene expression datasets and compared with state-of-the-art feature selection methods. CLBSO achieved an average accuracy improvement of 15% over the original high-dimensional data and outperformed other feature selection methods by up to 10%. For instance, in the Pancreatic cancer dataset, CLBSO achieved 97.2% accuracy, significantly higher than XGBoost-MOGA's 84.0%. Convergence analysis showed CLBSO required fewer iterations to reach optimal solutions. Statistical analysis confirmed significant performance improvements, and stability analysis demonstrated consistent gene subset selection across different runs. These findings highlight the robustness and efficacy of CLBSO in handling complex gene expression datasets, making it a valuable tool for enhancing classification tasks in bioinformatics.

摘要

由于基因表达数据的高维度和复杂性,基因表达数据分析具有挑战性。特征选择是一种常见的预处理步骤,用于识别相关基因。我们提出了一种基于综合学习的群优化(CLBSO)方法用于基因表达数据的特征选择。CLBSO利用蚂蚁和蚱蜢的优势来有效地探索高维搜索空间。蚂蚁进行局部搜索并留下信息素轨迹以引导群体,而蚱蜢利用它们远距离跳跃的能力来探索新区域并避免局部最优。我们在几个公开可用的基因表达数据集上对所提出的方法进行了评估,并与当前最先进的特征选择方法进行了比较。CLBSO相对于原始高维数据平均准确率提高了15%,并且比其他特征选择方法的性能高出多达10%。例如,在胰腺癌数据集中,CLBSO的准确率达到了97.2%,显著高于XGBoost-MOGA的84.0%。收敛分析表明CLBSO达到最优解所需的迭代次数更少。统计分析证实了性能的显著提升,稳定性分析表明在不同运行中基因子集选择具有一致性。这些发现突出了CLBSO在处理复杂基因表达数据集方面的稳健性和有效性,使其成为增强生物信息学中分类任务的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5135/11408137/7958f9c8da5e/gr001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验