Yuan Wei, Li Yaming, Han Zhengpan, Chen Yu, Xie Jinnan, Chen Jianguo, Bi Zhisheng, Xi Jianing
School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China.
Biomedicines. 2024 Sep 12;12(9):2086. doi: 10.3390/biomedicines12092086.
The identification of significant gene biclusters with particular expression patterns and the elucidation of functionally related genes within gene expression data has become a critical concern due to the vast amount of gene expression data generated by RNA sequencing technology. In this paper, a Conserved Gene Expression Module based on Genetic Algorithm (CGEMGA) is proposed. Breast cancer data from the TCGA database is used as the subject of this study. The -values from Fisher's exact test are used as evaluation metrics to demonstrate the significance of different algorithms, including the Cheng and Church algorithm, CGEM algorithm, etc. In addition, the F-test is used to investigate the difference between our method and the CGEM algorithm. The computational cost of the different algorithms is further investigated by calculating the running time of each algorithm. Finally, the established driver genes and cancer-related pathways are used to validate the process. The results of 10 independent runs demonstrate that CGEMGA has a superior average -value of 1.54 × 10 ± 3.06 × 10 compared to all other algorithms. Furthermore, our approach exhibits consistent performance across all methods. The F-test yields a -value of 0.039, indicating a significant difference between our approach and the CGEM. Computational cost statistics also demonstrate that our approach has a significantly shorter average runtime of 5.22 × 10 ± 1.65 × 10 s compared to the other algorithms. Enrichment analysis indicates that the genes in our approach are significantly enriched for driver genes. Our algorithm is fast and robust, efficiently extracting co-expressed genes and associated co-expression condition biclusters from RNA-seq data.
由于RNA测序技术产生了大量的基因表达数据,识别具有特定表达模式的重要基因双聚类以及阐明基因表达数据中功能相关的基因已成为一个关键问题。本文提出了一种基于遗传算法的保守基因表达模块(CGEMGA)。使用来自TCGA数据库的乳腺癌数据作为本研究的对象。将Fisher精确检验的p值用作评估指标,以证明包括Cheng和Church算法、CGEM算法等不同算法的显著性。此外,使用F检验来研究我们的方法与CGEM算法之间的差异。通过计算每种算法的运行时间,进一步研究不同算法的计算成本。最后,使用已建立的驱动基因和癌症相关通路来验证该过程。10次独立运行的结果表明,与所有其他算法相比,CGEMGA的平均p值更高,为1.54×10 ± 3.06×10 。此外,我们的方法在所有方法中表现出一致的性能。F检验得出的p值为0.039,表明我们的方法与CGEM之间存在显著差异。计算成本统计还表明,与其他算法相比,我们的方法平均运行时间明显更短,为5.22×10 ± 1.65×10 秒。富集分析表明,我们方法中的基因在驱动基因方面显著富集。我们的算法快速且稳健,能够从RNA-seq数据中高效提取共表达基因和相关的共表达条件双聚类。