Gan Xiangchao, Liew Alan Wee-Chung, Yan Hong
Department of Computer Science, King's College London, UK.
BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.
In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns.
In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes.
We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.
在DNA微阵列实验中,发现具有相似转录特征的基因群组有助于功能注释、组织分类和基序识别。然而,在许多情况下,一部分基因仅在一部分条件下呈现一致的模式。因此,处理表达矩阵中整行或整列的传统聚类算法无法检测到数据中的这些有用模式。最近,双聚类已被提出用于检测在一部分条件下呈现一致模式的基因子集。然而,大多数现有的双聚类算法基于通过优化某些启发式定义的优点函数在数据矩阵中搜索子矩阵。此外,这些算法中的大多数只能检测到一组受限的双聚类模式。
在本文中,我们为双聚类问题提出了一种新颖的几何视角。双聚类过程被解释为在高维数据空间中检测线性几何形状。这种新视角将具有不同模式的双聚类视为高维空间中的超平面,并允许我们通过匹配一组特定的线性几何形状同时处理不同类型的线性模式。这种几何观点还启发我们提出一种通用的双聚类模式,即统一看似不兼容的加法和乘法双聚类模型的线性相干模型。作为我们框架的一个具体实现,我们实现了一种基于霍夫变换的超平面检测算法。在人类淋巴瘤基因表达数据集上的实验结果表明,我们的算法可以找到具有生物学意义的基因子集。
我们为双聚类问题提出了一种新颖的几何解释。我们已经表明,许多常见类型的双聚类只是高维数据空间中超平面的不同空间排列。使用快速霍夫变换进行超平面检测的几何框架实现可用于在微阵列数据分析的条件子集下发现具有生物学意义的基因子集。