Suppr超能文献

BrEPS 2.0:用于酶注释的序列模式预测优化

BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.

作者信息

Dudek Christian-Alexander, Dannheim Henning, Schomburg Dietmar

机构信息

Department of Bioinformatics and Biochemistry, Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany.

出版信息

PLoS One. 2017 Jul 27;12(7):e0182216. doi: 10.1371/journal.pone.0182216. eCollection 2017.

Abstract

The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.

摘要

基因功能预测对于众多不同的生命科学领域至关重要。更快的高通量测序技术产生了越来越多且规模越来越大的数据集。通过传统湿实验室实验进行的手动注释并不适用于这些海量数据。我们之前表明,基于人工整理序列的、基于自动序列模式的BrEPS协议可用于预测基因的酶功能。不断增长的序列数据库为生成更可靠的模式提供了机会,但对于自动协议的实施也是一项挑战。我们重新实现并优化了BrEPS模式生成,使其能在可接受的时间范围内适用于更大的数据集。新BrEPS协议的主要改进在于增强了数据选择步骤。来自Swiss - Prot的人工整理注释被用作蛋白质水平上观察到的酶功能预测的可靠来源。序列库通过来自TrEMBL和SwissProt的高度相似序列得到扩展。这使我们能够限制对Swiss - Prot条目的选择,同时又不会丢失生成有意义模式所需的序列多样性。此外,通过在半保守位置用高度相似的氨基酸扩展模式,引入了一种辅助模式类型。扩展后的模式具有更高的复杂性,增加了匹配更多序列的机会,同时又不会丢失模式的基本结构信息。为了提高数据库的可用性,我们引入了基于一致EC编号和IUBMB酶命名法的酶功能预测。BrEPS是不伦瑞克酶数据库(BRENDA)的一部分,可在一个完全重新设计的网站上获取,也可下载。该数据库可以下载,并与BrEPScmd命令行工具一起用于大规模序列分析。BrEPS网站以及数据库创建工具、命令行工具和数据库的下载均可在http://breps.tu - bs.de免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2146/5531587/2bd2fda7db01/pone.0182216.g003.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验