Madeira Sara C, Oliveira Arlindo L
University of Beira Interior, Rua Marquês D'Avila e Bolama, Covilhã, Portugal.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
已经提出了大量聚类方法用于分析从微阵列实验获得的基因表达数据。然而,将标准聚类方法应用于基因的结果是有限的。这种局限性是由许多基因活性不相关的实验条件的存在所导致的。在进行条件聚类时也存在类似的局限性。因此,已经提出了许多对数据矩阵的行和列维度同时进行聚类的算法。目标是找到子矩阵,即基因子组和条件子组,其中基因在每个条件下都表现出高度相关的活性。在本文中,我们将这类算法称为双聚类。双聚类在文献中也被称为协同聚类和直接聚类等,并且也已用于信息检索和数据挖掘等领域。在这项全面的综述中,我们分析了大量现有的双聚类方法,并根据它们能够找到的双聚类类型、发现的双聚类模式、执行搜索所使用的方法、评估解决方案所使用的方法以及目标应用对它们进行分类。