Suppr超能文献

一种用于双聚类基因表达数据的通用框架。

A general framework for biclustering gene expression data.

作者信息

Li Haifeng, Chen Xin, Zhang Keshu, Jiang Tao

机构信息

Center of Excellence in Genomic Science, University of Southern California, Los Angeles, CA 90089, USA.

出版信息

J Bioinform Comput Biol. 2006 Aug;4(4):911-33. doi: 10.1142/s021972000600217x.

Abstract

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising.

摘要

已经提出了大量双聚类方法来检测基因表达数据中的模式。所有这些方法都试图找到某种类型的双聚类,但没有一种方法能够发现数据中所有类型的模式。此外,研究人员不得不设计新的算法,以便找到生物学家感兴趣的新型双聚类/模式。在本文中,我们提出了一种新颖的双聚类方法,总体而言,该方法可用于发现基因表达数据中所有可计算的模式。该方法基于柯尔莫哥洛夫复杂度理论。更确切地说,我们使用柯尔莫哥洛夫复杂度来衡量子矩阵的随机性,将其作为双聚类的指标,因为随机性自然在于缺乏规律性,而这是所有类型模式的共同属性。基于算法概率测度,我们开发了一种马尔可夫链蒙特卡罗算法来搜索双聚类。我们的方法还可以轻松扩展以解决传统聚类和棋盘型双聚类的问题。对模拟数据和真实数据的初步实验表明,我们的方法非常通用且前景广阔。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验