家族关系：是否应达成共识？——蛋白质家族的共识聚类

Family relationships: should consensus reign?--consensus clustering for protein families.

作者信息

Nikolski Macha, Sherman David J

机构信息

CNRS/LaBRI, Université Bordeaux 1 351 cours de la Libération, 33405 Talence Cedex, France.

出版信息

Bioinformatics. 2007 Jan 15;23(2):e71-6. doi: 10.1093/bioinformatics/btl314.

DOI:10.1093/bioinformatics/btl314

PMID:17237108

Abstract

MOTIVATION

Reliable identification of protein families is key to phylogenetic analysis, functional annotation and the exploration of protein function diversity in a given phylogenetic branch. As more and more complete genomes are sequenced, there is a need for powerful and reliable algorithms facilitating protein families construction.

RESULTS

We have formulated the problem of protein families construction as an instance of consensus clustering, for which we designed a novel algorithm that is computationally efficient in practice and produces high quality results. Our algorithm uses an election method to construct consensus families from competing clustering computations. Our consensus clustering algorithm is tailored to serve the specific needs of comparative genomics projects. First, it provides a robust means to incorporate results from different and complementary clustering methods, thus avoiding the need for an a priori choice that may introduce computational bias in the results. Second, it is suited to large-scale projects due to the practical efficiency. And third, it produces high quality results where families tend to represent groupings by biological function.

AVAILABILITY

This method has been used for Génolevures project to compute protein families of Hemiascomycetous yeasts. The data are available online at http://cbi.labri.fr/Genolevures/fam/

摘要

动机

可靠地识别蛋白质家族是系统发育分析、功能注释以及探索给定系统发育分支中蛋白质功能多样性的关键。随着越来越多的完整基因组被测序，需要强大且可靠的算法来促进蛋白质家族的构建。

结果

我们已将蛋白质家族构建问题表述为共识聚类的一个实例，为此我们设计了一种新颖的算法，该算法在实际计算中效率很高且能产生高质量的结果。我们的算法使用一种选举方法，从相互竞争的聚类计算中构建共识家族。我们的共识聚类算法是为满足比较基因组学项目的特定需求而量身定制的。首先，它提供了一种稳健的方法来整合来自不同且互补的聚类方法的结果，从而避免了可能在结果中引入计算偏差的先验选择的必要性。其次，由于其实际效率，它适用于大规模项目。第三，它能产生高质量的结果，其中家族倾向于按生物学功能进行分组。

可用性

此方法已用于Génolevures项目，以计算半子囊菌酵母的蛋白质家族。数据可在http://cbi.labri.fr/Genolevures/fam/在线获取。

相似文献

Family relationships: should consensus reign?--consensus clustering for protein families.

Bioinformatics. 2007 Jan 15;23(2):e71-6. doi: 10.1093/bioinformatics/btl314.

Incremental generation of summarized clustering hierarchy for protein family analysis.

Bioinformatics. 2004 Nov 1;20(16):2586-96. doi: 10.1093/bioinformatics/bth290. Epub 2004 May 6.

Bayesian search of functionally divergent protein subgroups and their function specific residues.

Bioinformatics. 2006 Oct 15;22(20):2466-74. doi: 10.1093/bioinformatics/btl411. Epub 2006 Jul 26.

Efficient functional clustering of protein sequences using the Dirichlet process.

Bioinformatics. 2008 Aug 15;24(16):1765-71. doi: 10.1093/bioinformatics/btn244. Epub 2008 May 29.

Modelling interaction sites in protein domains with interaction profile hidden Markov models.

Bioinformatics. 2006 Dec 1;22(23):2851-7. doi: 10.1093/bioinformatics/btl486. Epub 2006 Sep 25.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.

BMC Bioinformatics. 2005 Oct 3;6:242. doi: 10.1186/1471-2105-6-242.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

Detecting protein dissimilarities in multiple alignments using Bayesian variable selection.

Bioinformatics. 2007 Jan 15;23(2):245-6. doi: 10.1093/bioinformatics/btl566. Epub 2006 Nov 14.

引用本文的文献

Binding interface change and cryptic variation in the evolution of protein-protein interactions.

BMC Evol Biol. 2016 Feb 18;16:40. doi: 10.1186/s12862-016-0608-1.

Inferring gene family histories in yeast identifies lineage specific expansions.

PLoS One. 2014 Jun 12;9(6):e99480. doi: 10.1371/journal.pone.0099480. eCollection 2014.

Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

PLoS One. 2013;8(2):e46468. doi: 10.1371/journal.pone.0046468. Epub 2013 Feb 15.

A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica.

BMC Syst Biol. 2012 May 4;6:35. doi: 10.1186/1752-0509-6-35.

Identification of conserved gene clusters in multiple genomes based on synteny and homology.

BMC Bioinformatics. 2011 Oct 5;12 Suppl 9(Suppl 9):S18. doi: 10.1186/1471-2105-12-S9-S18.

IONS: Identification of Orthologs by Neighborhood and Similarity-an Automated Method to Identify Orthologs in Chromosomal Regions of Common Evolutionary Ancestry and its Application to Hemiascomycetous Yeasts.

Evol Bioinform Online. 2011;7:123-33. doi: 10.4137/EBO.S7465. Epub 2011 Aug 30.

Genome-wide computational prediction of tandem gene arrays: application in yeasts.

BMC Genomics. 2010 Jan 21;11:56. doi: 10.1186/1471-2164-11-56.

Combined phylogeny and neighborhood analysis of the evolution of the ABC transporters conferring multiple drug resistance in hemiascomycete yeasts.

BMC Genomics. 2009 Oct 1;10:459. doi: 10.1186/1471-2164-10-459.

Comparative genomics of protoploid Saccharomycetaceae.

Genome Res. 2009 Oct;19(10):1696-709. doi: 10.1101/gr.091546.109. Epub 2009 Jun 12.

Génolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes.

Nucleic Acids Res. 2009 Jan;37(Database issue):D550-4. doi: 10.1093/nar/gkn859. Epub 2008 Nov 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

家族关系：是否应达成共识？——蛋白质家族的共识聚类

Family relationships: should consensus reign?--consensus clustering for protein families.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献