Liu Jinze, Wang Jiong, Wang Wei
Department of Computer Science, University of North Carolina, Chapel Hill, 27599, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:182-93. doi: 10.1109/csb.2004.1332431.
The advent of DNA microarray technologies has revolutionized the experimental study of gene expression. Clustering is the most popular approach of analyzing gene expression data and has indeed proven to be successful in many applications. Our work focuses on discovering a subset of genes which exhibit similar expression patterns along a subset of conditions in the gene expression matrix. Specifically, we are looking for the Order Preserving clusters (OPCluster), in each of which a subset of genes induce a similar linear ordering along a subset of conditions. The pioneering work of the OPSM model[3], which enforces the strict order shared by the genes in a cluster, is included in our model as a special case. Our model is more robust than OPSM because similarly expressed conditions are allowed to form order equivalent groups and no restriction is placed on the order within a group. Guided by our model, we design and implement a deterministic algorithm, namely OPCTree, to discover OP-Clusters. Experimental study on two real datasets demonstrates the effectiveness of the algorithm in the application of tissue classification and cell cycle identification. In addition, a large percentage of OP-Clusters exhibit significant enrichment of one or more function categories, which implies that OP-Clusters indeed carry significant biological relevance.
DNA微阵列技术的出现彻底改变了基因表达的实验研究。聚类是分析基因表达数据最流行的方法,并且在许多应用中确实已被证明是成功的。我们的工作重点是在基因表达矩阵中发现沿着一组条件表现出相似表达模式的基因子集。具体来说,我们正在寻找保序聚类(OPCluster),其中每个基因子集沿着一组条件诱导出相似的线性排序。OPSM模型[3]的开创性工作,它强制聚类中的基因共享严格的顺序,作为一种特殊情况包含在我们的模型中。我们的模型比OPSM更强大,因为允许相似表达的条件形成顺序等效组,并且对组内的顺序没有限制。在我们模型的指导下,我们设计并实现了一种确定性算法,即OPCTree,以发现OP-聚类。对两个真实数据集的实验研究证明了该算法在组织分类和细胞周期识别应用中的有效性。此外,很大比例的OP-聚类表现出一个或多个功能类别的显著富集,这意味着OP-聚类确实具有重要的生物学相关性。