Suppr超能文献

博洛尼亚注释资源:一种基于大规模比较基因组分析的蛋白质序列功能和结构注释的非分层方法。

The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis.

作者信息

Bartoli Lisa, Montanucci Ludovica, Fronza Raffaele, Martelli Pier Luigi, Fariselli Piero, Carota Luciana, Donvito Giacinto, Maggi Giorgio P, Casadio Rita

机构信息

Biocomputing Group, CIRB/Dept of Biology, University of Bologna, Italy.

出版信息

J Proteome Res. 2009 Sep;8(9):4362-71. doi: 10.1021/pr900204r.

Abstract

Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).

摘要

蛋白质序列注释是后基因组时代的一项重大挑战。由于完整基因组和蛋白质组的可得性,蛋白质注释最近从跨基因组比较中获得了宝贵的优势。在这项工作中,我们描述了一种新的非层次聚类程序,其特点是采用了严格的度量标准,即使在多结构域和远缘相关蛋白质的情况下,也能确保相关蛋白质之间功能的可靠转移。该方法利用了对599个已完全测序的原核生物和真核生物基因组的比较分析,以及对这些聚类的基因本体(GO)和蛋白质数据银行/结构分类(PDB/SCOP)映射。对我们方法的统计验证表明,我们的聚类技术捕捉到了同源和远缘相关蛋白质序列之间共享的基本信息。据此,未表征的蛋白质可以通过继承聚类的注释来安全地进行注释。我们通过对另外201个基因组进行盲注释来验证我们的方法,最后我们开发了BAR(博洛尼亚注释资源),这是一个基于总共800个基因组的蛋白质功能注释预测服务器(可在http://microserf.biocomp.unibo.it/bar/上公开获取)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验