Suppr超能文献

博洛尼亚注释资源(BAR 3.0):改进蛋白质功能注释。

The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

机构信息

Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy.

出版信息

Nucleic Acids Res. 2017 Jul 3;45(W1):W285-W290. doi: 10.1093/nar/gkx330.

Abstract

BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3.

摘要

BAR 3.0 更新了我们的服务器 BAR(博洛尼亚注释资源),用于从序列预测蛋白质结构和功能特征。我们增加了数据量、查询功能和向用户传达的信息。BAR 3.0 的核心是基于 UniProtKB 序列的图聚类过程,遵循严格的两两相似性标准(序列同一性≥40%,对齐覆盖率≥90%)。每个聚类包含从 UniProtKB、GO、PFAM 和 PDB 下载的可用注释。经过统计验证,GO 术语和 PFAM 结构域是聚类特有的,并在满足相似性约束后注释新进入聚类的序列。BAR 3.0 包含 28869663 条序列,分为 1361773 个聚类,其中 22.2%(22241661 条序列)和 47.4%(24555055 条序列)至少有一个经过验证的 GO 术语和一个 PFAM 结构域。1.4%的聚类(占所有序列的 36%)包含 PDB 结构,并且聚类与一个隐马尔可夫模型相关联,该模型允许构建适合结构建模的模板-靶标对齐。还有其他 339026 条序列是单例。BAR 3.0 提供了一个改进的搜索界面,允许通过 UniProtKB 访问号、Fasta 序列、GO 术语、PFAM 结构域、生物体、PDB 和配体/底物进行查询。在 CAFA2 目标上进行评估时,BAR 3.0 大大优于我们的上一版本,并且在最先进的方法中得分较高。BAR 3.0 是公开的,可以在 http://bar.biocomp.unibo.it/bar3 上访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5159/5570247/b9d4ddb0052b/gkx330fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验