一种用于双聚类基因表达数据的通用框架。

A general framework for biclustering gene expression data.

作者信息

Li Haifeng, Chen Xin, Zhang Keshu, Jiang Tao

机构信息

Center of Excellence in Genomic Science, University of Southern California, Los Angeles, CA 90089, USA.

出版信息

J Bioinform Comput Biol. 2006 Aug;4(4):911-33. doi: 10.1142/s021972000600217x.

DOI:10.1142/s021972000600217x

PMID:17007074

Abstract

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising.

摘要

已经提出了大量双聚类方法来检测基因表达数据中的模式。所有这些方法都试图找到某种类型的双聚类，但没有一种方法能够发现数据中所有类型的模式。此外，研究人员不得不设计新的算法，以便找到生物学家感兴趣的新型双聚类/模式。在本文中，我们提出了一种新颖的双聚类方法，总体而言，该方法可用于发现基因表达数据中所有可计算的模式。该方法基于柯尔莫哥洛夫复杂度理论。更确切地说，我们使用柯尔莫哥洛夫复杂度来衡量子矩阵的随机性，将其作为双聚类的指标，因为随机性自然在于缺乏规律性，而这是所有类型模式的共同属性。基于算法概率测度，我们开发了一种马尔可夫链蒙特卡罗算法来搜索双聚类。我们的方法还可以轻松扩展以解决传统聚类和棋盘型双聚类的问题。对模拟数据和真实数据的初步实验表明，我们的方法非常通用且前景广阔。

相似文献

A general framework for biclustering gene expression data.

J Bioinform Comput Biol. 2006 Aug;4(4):911-33. doi: 10.1142/s021972000600217x.

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

A systematic comparison and evaluation of biclustering methods for gene expression data.

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Application of simulated annealing to the biclustering of gene expression data.

IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.

Parallelized evolutionary learning for detection of biclusters in gene expression data.

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

BicAT: a biclustering analysis toolbox.

Bioinformatics. 2006 May 15;22(10):1282-3. doi: 10.1093/bioinformatics/btl099. Epub 2006 Mar 21.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.

BMC Bioinformatics. 2008 Oct 27;9:458. doi: 10.1186/1471-2105-9-458.

Shifting and scaling patterns from gene expression data.

Bioinformatics. 2005 Oct 15;21(20):3840-5. doi: 10.1093/bioinformatics/bti641. Epub 2005 Sep 6.

WF-MSB: a weighted fuzzy-based biclustering method for gene expression data.

Int J Data Min Bioinform. 2011;5(1):89-109. doi: 10.1504/ijdmb.2011.038579.

引用本文的文献

ARBic: an all-round biclustering algorithm for analyzing gene expression data.

NAR Genom Bioinform. 2023 Jan 31;5(1):lqad009. doi: 10.1093/nargab/lqad009. eCollection 2023 Mar.

Biclustering methods: biological relevance and application in gene expression analysis.

PLoS One. 2014 Mar 20;9(3):e90801. doi: 10.1371/journal.pone.0090801. eCollection 2014.

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.

Nucleic Acids Res. 2009 Aug;37(15):e101. doi: 10.1093/nar/gkp491. Epub 2009 Jun 9.

An efficient voting algorithm for finding additive biclusters with random background.

J Comput Biol. 2008 Dec;15(10):1275-93. doi: 10.1089/cmb.2007.0219.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于双聚类基因表达数据的通用框架。

A general framework for biclustering gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献