Suppr超能文献

CLEAN:聚类富集分析。

CLEAN: CLustering Enrichment ANalysis.

作者信息

Freudenberg Johannes M, Joshi Vineet K, Hu Zhen, Medvedovic Mario

机构信息

Laboratory for Statistical Genomics and Systems Biology, Department of Environmental Health, University of Cincinnati College of Medicine, 3223 Eden Av, ML 56, Cincinnati OH 45267-0056, USA.

出版信息

BMC Bioinformatics. 2009 Jul 29;10:234. doi: 10.1186/1471-2105-10-234.

Abstract

BACKGROUND

Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation.

RESULTS

We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView).

CONCLUSION

Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile.

摘要

背景

整合编码在各种功能相关基因列表中的生物学知识,已成为分析全基因组功能基因组学数据的最重要方面之一。在聚类分析的背景下,通过此类分析建立的聚类的功能一致性已被用于识别具有生物学意义的聚类、比较聚类算法以及识别与所研究生物过程相关的生物途径。

结果

我们开发了一个计算框架,用于将基于知识的功能类别与基因组学数据的聚类分析进行分析性和可视化整合。该框架基于简单、概念上有吸引力且生物学上可解释的基因特异性功能一致性得分(CLEAN得分)。该得分通过将整个聚类结构与感兴趣的功能类别相关联得出。我们直接证明,以这种方式整合生物学知识可提高聚类分析得出的结论的可重复性。CLEAN得分根据基因在富集功能类别中的成员身份,区分同一聚类内基因的功能一致性水平。我们表明,与基于传统全聚类分析的得分相比,这一方面在独立数据集中具有更高的可重复性,并产生更多用于区分不同样本类型的信息丰富的基因。我们还展示了CLEAN框架在比较不同算法产生的聚类方面的效用。CLEAN作为一个R附加包实现,可从http://Clusteranalysis.org下载。该包集成了用于计算基因特异性功能一致性得分的例程以及基于Java的开源交互式查看器功能树视图(FTreeView)。

结论

我们的结果表明,使用基因特异性功能一致性得分比使用传统的全聚类得分能提高关于共表达基因聚类得出的结论的可重复性。使用基因特异性一致性得分还简化了不同聚类算法产生的聚类的比较,并提供了一个选择具有“功能一致”表达谱的基因的简单工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89cd/2734555/54c92aeaeccd/1471-2105-10-234-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验