Suppr超能文献

用于从微阵列数据中进行基因选择的模拟退火辅助遗传算法。

Simulated annealing aided genetic algorithm for gene selection from microarray data.

作者信息

Marjit Shyam, Bhattacharyya Trinav, Chatterjee Bitanu, Sarkar Ram

机构信息

Department of Computer Science and Engineering, Indian Institute of Information Technology Guwahati, Guwahati, 781015, Assam, India.

Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, West Bengal, India.

出版信息

Comput Biol Med. 2023 May;158:106854. doi: 10.1016/j.compbiomed.2023.106854. Epub 2023 Mar 31.

Abstract

In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA.

摘要

近年来,微阵列基因表达数据集因其能通过生物标志物直接识别不同类型癌症的实用性而大受欢迎。这些数据集具有高基因样本比和高维度,只有少数基因作为生物标志物发挥作用。因此,大量数据是冗余的,仔细筛选出重要基因至关重要。在本文中,我们提出了模拟退火辅助遗传算法(SAGA),这是一种从高维数据集中识别信息基因的元启发式方法。SAGA利用基于双向变异的模拟退火(SA)和遗传算法(GA),分别确保在搜索空间的利用和探索之间取得良好平衡。遗传算法的原始版本常常陷入局部最优,且依赖初始种群,导致早熟收敛。为解决此问题,我们将基于聚类的种群生成与模拟退火相结合,将遗传算法的初始种群分布在整个特征空间。为进一步提高性能,我们通过一种名为互信息相关系数(MICC)的基于分数的过滤方法缩小初始搜索空间。所提出的方法在6个微阵列和6个组学数据集上进行了评估。SAGA与当代算法的比较表明,SAGA的性能比同类算法好得多。我们的代码可在https://github.com/shyammarjit/SAGA获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验