Suppr超能文献

系统生物学中通过数据矩阵的最优重排进行双聚类分析:严格方法与比较研究。

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.

作者信息

DiMaggio Peter A, McAllister Scott R, Floudas Christodoulos A, Feng Xiao-Jiang, Rabinowitz Joshua D, Rabitz Herschel A

机构信息

Department of Chemical Engineering, Princeton University, Princeton, NJ, USA.

出版信息

BMC Bioinformatics. 2008 Oct 27;9:458. doi: 10.1186/1471-2105-9-458.

Abstract

BACKGROUND

The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters.

RESULTS

In this article, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. Cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices to generate biclusters. The performance of OREO is tested on (a) metabolite concentration data, (b) an image reconstruction matrix, (c) synthetic data with implanted biclusters, and gene expression data for (d) colon cancer data, (e) breast cancer data, as well as (f) yeast segregant data to validate the ability of the proposed method and compare it to existing biclustering and clustering methods.

CONCLUSION

We demonstrate that this rigorous global optimization method for biclustering produces clusters with more insightful groupings of similar entities, such as genes or metabolites sharing common functions, than other clustering and biclustering algorithms and can reconstruct underlying fundamental patterns in the data for several distinct sets of data matrices arising in important biological applications.

摘要

背景

通过聚类技术对大规模数据集进行分析在许多应用中都有使用。双聚类尤其在基因表达数据分析中成为一个重要问题,因为基因可能仅在部分条件子集上共同响应。双聚类算法在样本分类中也有重要应用,例如,组织样本可被分类为癌性或正常。许多双聚类方法以及一般的聚类算法,都使用简化模型或启发式策略,根据某种度量和聚类定义来识别元素的“最佳”分组,从而导致次优聚类。

结果

在本文中,我们提出了一种用于双聚类的严谨方法OREO,它基于对数据矩阵的行和列进行最优重新排序,以便全局最小化差异度量。数据矩阵行和列的物理排列可建模为网络流问题或旅行商问题。一维中的聚类边界用于对相应子矩阵的其他维度进行划分和重新排序,以生成双聚类。OREO的性能在以下数据上进行了测试:(a)代谢物浓度数据,(b)图像重建矩阵,(c)植入双聚类的合成数据,以及(d)结肠癌数据、(e)乳腺癌数据和(f)酵母分离株数据的基因表达数据,以验证所提出方法的能力,并将其与现有的双聚类和聚类方法进行比较。

结论

我们证明,这种用于双聚类的严谨全局优化方法所产生的聚类,对于具有相似功能的实体(如基因或代谢物)的分组比其他聚类和双聚类算法更具洞察力,并且可以为重要生物学应用中出现的几类不同数据矩阵重建数据中的潜在基本模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61af/2605474/631625e2631b/1471-2105-9-458-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验