South China University of Technology, Guangzhou.
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.
The analysis of gene expression data obtained from microarray experiments is important for discovering the biological process of genes. Biclustering algorithms have been proven to be able to group the genes with similar expression patterns under a number of experimental conditions. In this paper, we propose a new biclustering algorithm based on evolutionary learning. By converting the biclustering problem into a common clustering problem, the algorithm can be applied in a search space constructed by the conditions. To further reduce the size of the search space, we randomly separate the full conditions into a number of condition subsets (subspaces), each of which has a smaller number of conditions. The algorithm is applied to each subspace and is able to discover bicluster seeds within a limited computing time. Finally, an expanding and merging procedure is employed to combine the bicluster seeds into larger biclusters according to a homogeneity criterion. We test the performance of the proposed algorithm using synthetic and real microarray data sets. Compared with several previously developed biclustering algorithms, our algorithm demonstrates a significant improvement in discovering additive biclusters.
从基因芯片实验中获得的基因表达数据分析对于发现基因的生物学过程非常重要。已经证明,双聚类算法能够在许多实验条件下对具有相似表达模式的基因进行分组。在本文中,我们提出了一种基于进化学习的新的双聚类算法。通过将双聚类问题转化为常见的聚类问题,该算法可以应用于由条件构建的搜索空间中。为了进一步减小搜索空间的大小,我们将完整的条件随机划分为多个条件子集(子空间),每个子空间的条件数量较少。该算法应用于每个子空间,并能够在有限的计算时间内发现双聚类种子。最后,根据同质性标准,采用扩展和合并过程将双聚类种子组合成更大的双聚类。我们使用合成和真实的基因芯片数据集来测试所提出算法的性能。与几个先前开发的双聚类算法相比,我们的算法在发现加性双聚类方面有显著的改进。