一种从DNA微阵列数据集中挖掘双簇的条件枚举树方法。

A Condition-Enumeration Tree method for mining biclusters from DNA microarray data sets.

作者信息

Chen Jiun-Rung, Chang Ye-In

机构信息

Dept. of Computer Science and Engineering, National Sun Yat-Sen University, No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan, ROC.

出版信息

Biosystems. 2009 Jul;97(1):44-59. doi: 10.1016/j.biosystems.2009.04.003. Epub 2009 Apr 23.

DOI:10.1016/j.biosystems.2009.04.003

PMID:19393714

Abstract

Biclustering, which performs simultaneous clustering of rows (e.g., genes) and columns (e.g., conditions), has proved of great value for finding interesting patterns from microarray data. To find biclusters, a model called pCluster was proposed. A pCluster consists of a set of genes and a set of conditions, where the expression levels of these genes have a similar variation under these conditions. Based on this model, most of the previous methods need to compute MDSs (maximum dimension sets) for every two genes in the microarray data. Since the number of genes is far larger than the number of conditions, this step is inefficient. Another method called MicroCluster was proposed. This method does not compute MDSs for every two genes, and transforms the problem into a graph problem. However, it needs to solve the Maximal Clique problem, which is NP-Complete. To avoid the above disadvantages, in this paper, we propose a new method, CE-Tree (Condition-Enumeration Tree), for finding pClusters. Instead of generating MDSs for every two genes, we generate only MDSs for every two conditions. Then, based only on these MDSs, we expand the CE-Tree in a special local breadth-first within global depth-first manner to efficiently find all pClusters. We also utilize the idea of the traditional hash join approach to efficiently support the CE-Tree. From the simulation results, we show that the CE-Tree method could find pClusters more efficiently than those previous methods.

摘要

双聚类，即对行（如基因）和列（如条件）同时进行聚类，已被证明对于从微阵列数据中发现有趣的模式具有重要价值。为了找到双聚类，人们提出了一种名为pCluster的模型。一个pCluster由一组基因和一组条件组成，其中这些基因在这些条件下的表达水平具有相似的变化。基于此模型，之前的大多数方法需要为微阵列数据中的每两个基因计算最大维数集（MDSs）。由于基因的数量远大于条件的数量，这一步效率低下。另一种名为MicroCluster的方法被提出来了。该方法不为每两个基因计算MDSs，而是将问题转化为一个图问题。然而，它需要解决最大团问题，这是一个NP完全问题。为了避免上述缺点，在本文中，我们提出了一种新的方法——条件枚举树（CE-Tree）来寻找pClusters。我们不是为每两个基因生成MDSs，而是只为每两个条件生成MDSs。然后，仅基于这些MDSs，我们以一种特殊的全局深度优先内的局部广度优先方式扩展CE-Tree，以有效地找到所有的pClusters。我们还利用传统哈希连接方法的思想来有效地支持CE-Tree。从模拟结果来看，我们表明CE-Tree方法比之前的那些方法能更有效地找到pClusters。

相似文献

A Condition-Enumeration Tree method for mining biclusters from DNA microarray data sets.

Biosystems. 2009 Jul;97(1):44-59. doi: 10.1016/j.biosystems.2009.04.003. Epub 2009 Apr 23.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

Mining subspace clusters from DNA microarray data using large itemset techniques.

J Comput Biol. 2009 May;16(5):745-68. doi: 10.1089/cmb.2008.0161.

Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm.

IEEE Trans Inf Technol Biomed. 2009 Nov;13(6):969-75. doi: 10.1109/TITB.2009.2017527. Epub 2009 Mar 16.

Possibilistic approach for biclustering microarray data.

Comput Biol Med. 2007 Oct;37(10):1426-36. doi: 10.1016/j.compbiomed.2007.01.005. Epub 2007 Mar 8.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Towards clustering of incomplete microarray data without the use of imputation.

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

A new outlier removal approach for cDNA microarray normalization.

Biotechniques. 2009 Aug;47(2):691-2, 694-700. doi: 10.2144/000113195.

Spectral analysis of two-signed microarray expression data.

Math Med Biol. 2007 Jun;24(2):131-48. doi: 10.1093/imammb/dql030. Epub 2006 Nov 28.

Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.

Genome Inform. 2005;16(2):183-94.

引用本文的文献

Pattern-driven neighborhood search for biclustering of microarray data.

BMC Bioinformatics. 2012 May 8;13 Suppl 7(Suppl 7):S11. doi: 10.1186/1471-2105-13-S7-S11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种从DNA微阵列数据集中挖掘双簇的条件枚举树方法。

A Condition-Enumeration Tree method for mining biclusters from DNA microarray data sets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献