Gao Qinghui, Ho Christine, Jia Yingmin, Li Jingyi Jessica, Huang Haiyan
Seventh Research Division and Department of Systems and Control, Beihang University, Beijing China.
J Comput Biol. 2012 Jun;19(6):619-31. doi: 10.1089/cmb.2012.0032.
Identifying a bicluster, or submatrix of a gene expression dataset wherein the genes express similar behavior over the columns, is useful for discovering novel functional gene interactions. In this article, we introduce a new algorithm for finding biClusters with Linear Patterns (CLiP). Instead of solely maximizing Pearson correlation, we introduce a fitness function that also considers the correlation of complementary genes and conditions. This eliminates the need for a priori determination of the bicluster size. We employ both greedy search and the genetic algorithm in optimization, incorporating resampling for more robust discovery. When applied to both real and simulation datasets, our results show that CLiP is superior to existing methods. In analyzing RNA-seq fly and worm time-course data from modENCODE, we uncover a set of similarly expressed genes suggesting maternal dependence. Supplementary Material is available online (at www.liebertonline.com/cmb).
识别双聚类(即基因表达数据集中的子矩阵,其中基因在各列上表现出相似行为)对于发现新的功能基因相互作用很有用。在本文中,我们介绍了一种用于寻找具有线性模式的双聚类(CLiP)的新算法。我们引入了一个适应度函数,它不仅能使皮尔逊相关性最大化,还考虑了互补基因和条件的相关性,而不是仅仅最大化皮尔逊相关性。这消除了对双聚类大小进行先验确定的需要。我们在优化过程中同时采用了贪心搜索和遗传算法,并结合重采样以实现更稳健的发现。当应用于真实数据集和模拟数据集时,我们的结果表明CLiP优于现有方法。在分析来自modENCODE的RNA测序果蝇和蠕虫时间进程数据时,我们发现了一组表达相似的基因,表明其对母体的依赖性。补充材料可在网上获取(网址为www.liebertonline.com/cmb)。