BrEPS 2.0：用于酶注释的序列模式预测优化

BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.

作者信息

Dudek Christian-Alexander, Dannheim Henning, Schomburg Dietmar

机构信息

Department of Bioinformatics and Biochemistry, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany.

出版信息

PLoS One. 2017 Jul 27;12(7):e0182216. doi: 10.1371/journal.pone.0182216. eCollection 2017.

DOI:10.1371/journal.pone.0182216

PMID:28750104

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5531587/

Abstract

The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.

摘要

基因功能预测对于众多不同的生命科学领域至关重要。更快的高通量测序技术产生了越来越多且规模越来越大的数据集。通过传统湿实验室实验进行的手动注释并不适用于这些海量数据。我们之前表明，基于人工整理序列的、基于自动序列模式的BrEPS协议可用于预测基因的酶功能。不断增长的序列数据库为生成更可靠的模式提供了机会，但对于自动协议的实施也是一项挑战。我们重新实现并优化了BrEPS模式生成，使其能在可接受的时间范围内适用于更大的数据集。新BrEPS协议的主要改进在于增强了数据选择步骤。来自Swiss - Prot的人工整理注释被用作蛋白质水平上观察到的酶功能预测的可靠来源。序列库通过来自TrEMBL和SwissProt的高度相似序列得到扩展。这使我们能够限制对Swiss - Prot条目的选择，同时又不会丢失生成有意义模式所需的序列多样性。此外，通过在半保守位置用高度相似的氨基酸扩展模式，引入了一种辅助模式类型。扩展后的模式具有更高的复杂性，增加了匹配更多序列的机会，同时又不会丢失模式的基本结构信息。为了提高数据库的可用性，我们引入了基于一致EC编号和IUBMB酶命名法的酶功能预测。BrEPS是不伦瑞克酶数据库（BRENDA）的一部分，可在一个完全重新设计的网站上获取，也可下载。该数据库可以下载，并与BrEPScmd命令行工具一起用于大规模序列分析。BrEPS网站以及数据库创建工具、命令行工具和数据库的下载均可在http://breps.tu - bs.de免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2146/5531587/2bd2fda7db01/pone.0182216.g003.jpg

相似文献

BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.BrEPS 2.0：用于酶注释的序列模式预测优化

PLoS One. 2017 Jul 27;12(7):e0182216. doi: 10.1371/journal.pone.0182216. eCollection 2017.

BrEPS: a flexible and automatic protocol to compute enzyme-specific sequence profiles for functional annotation.BrEPS：一种用于计算酶特异性序列轮廓以进行功能注释的灵活自动协议。

BMC Bioinformatics. 2010 Dec 1;11:589. doi: 10.1186/1471-2105-11-589.

EnzymeDetector: an integrated enzyme function prediction tool and database.EnzymeDetector：一个集成的酶功能预测工具和数据库。

BMC Bioinformatics. 2011 Sep 23;12:376. doi: 10.1186/1471-2105-12-376.

Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.基因组时代的蛋白质序列注释：SWISS-PROT+TREMBL注释概念

Proc Int Conf Intell Syst Mol Biol. 1997;5:33-43.

ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot.ENZYMAP：利用蛋白质注释对 UniProt/Swiss-Prot 中的 EC 编号变化进行建模和预测。

PLoS One. 2014 Feb 19;9(2):e89162. doi: 10.1371/journal.pone.0089162. eCollection 2014.

DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.DomSign：一种自上而下的注释流程，用于拓展蛋白质世界中的酶空间。

BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.

HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot.HAMAP：一个包含完全测序的微生物蛋白质组集以及UniProtKB/Swiss-Prot中经人工整理的微生物蛋白质家族的数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D471-8. doi: 10.1093/nar/gkn661. Epub 2008 Oct 11.

Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences.用于挖掘支配酶定义的规则并对蛋白质序列进行分类的关联算法。

BMC Bioinformatics. 2006 Jun 15;7:304. doi: 10.1186/1471-2105-7-304.

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.运用C4.5数据挖掘算法对SWISS-PROT进行蛋白质注释的自动规则生成。

Bioinformatics. 2001 Oct;17(10):920-6. doi: 10.1093/bioinformatics/17.10.920.

引用本文的文献

BRENDA, the ELIXIR core data resource in 2021: new developments and updates.BRENDA，2021 年的 ELIXIR 核心数据资源：新的发展和更新。

Nucleic Acids Res. 2021 Jan 8;49(D1):D498-D508. doi: 10.1093/nar/gkaa1025.

本文引用的文献

BRENDA in 2017: new perspectives and new tools in BRENDA.2017年的BRENDA：BRENDA中的新视角与新工具。

Nucleic Acids Res. 2017 Jan 4;45(D1):D380-D388. doi: 10.1093/nar/gkw952. Epub 2016 Oct 19.

InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释

Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

CDD: NCBI's conserved domain database.CDD：美国国家生物技术信息中心的保守结构域数据库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D222-6. doi: 10.1093/nar/gku1221. Epub 2014 Nov 20.

UniProt: a hub for protein information.通用蛋白质数据库（UniProt）：蛋白质信息中心。

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

HAMAP in 2015: updates to the protein family classification and annotation system.2015年的HAMAP：蛋白质家族分类与注释系统的更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D1064-70. doi: 10.1093/nar/gku1002. Epub 2014 Oct 27.

Pfam: the protein families database.Pfam：蛋白质家族数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

The Structure-Function Linkage Database.结构-功能链接数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D521-30. doi: 10.1093/nar/gkt1130. Epub 2013 Nov 23.

TIGRFAMs and Genome Properties in 2013.TIGRFAMs 和 2013 年的基因组特性。

Nucleic Acids Res. 2013 Jan;41(Database issue):D387-95. doi: 10.1093/nar/gks1234. Epub 2012 Nov 28.

The PRINTS database: a fine-grained protein sequence annotation and analysis resource--its status in 2012.PRINTS 数据库：一种细粒度的蛋白质序列注释和分析资源——其 2012 年的状况。

Database (Oxford). 2012 Apr 15;2012:bas019. doi: 10.1093/database/bas019. Print 2012.

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.使用 Clustal Omega 快速、可扩展地生成高质量蛋白质多重序列比对。

Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BrEPS 2.0：用于酶注释的序列模式预测优化

BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献