• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DomSign:一种自上而下的注释流程,用于拓展蛋白质世界中的酶空间。

DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.

作者信息

Wang Tianmin, Mori Hiroshi, Zhang Chong, Kurokawa Ken, Xing Xin-Hui, Yamada Takuji

机构信息

Department of Biological Information, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 2-12-1 M6-3, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.

Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China.

出版信息

BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.

DOI:10.1186/s12859-015-0499-y
PMID:25888481
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4389672/
Abstract

BACKGROUND

Computational predictions of catalytic function are vital for in-depth understanding of enzymes. Because several novel approaches performing better than the common BLAST tool are rarely applied in research, we hypothesized that there is a large gap between the number of known annotated enzymes and the actual number in the protein universe, which significantly limits our ability to extract additional biologically relevant functional information from the available sequencing data. To reliably expand the enzyme space, we developed DomSign, a highly accurate domain signature-based enzyme functional prediction tool to assign Enzyme Commission (EC) digits.

RESULTS

DomSign is a top-down prediction engine that yields results comparable, or superior, to those from many benchmark EC number prediction tools, including BLASTP, when a homolog with an identity >30% is not available in the database. Performance tests showed that DomSign is a highly reliable enzyme EC number annotation tool. After multiple tests, the accuracy is thought to be greater than 90%. Thus, DomSign can be applied to large-scale datasets, with the goal of expanding the enzyme space with high fidelity. Using DomSign, we successfully increased the percentage of EC-tagged enzymes from 12% to 30% in UniProt-TrEMBL. In the Kyoto Encyclopedia of Genes and Genomes bacterial database, the percentage of EC-tagged enzymes for each bacterial genome could be increased from 26.0% to 33.2% on average. Metagenomic mining was also efficient, as exemplified by the application of DomSign to the Human Microbiome Project dataset, recovering nearly one million new EC-labeled enzymes.

CONCLUSIONS

Our results offer preliminarily confirmation of the existence of the hypothesized huge number of "hidden enzymes" in the protein universe, the identification of which could substantially further our understanding of the metabolisms of diverse organisms and also facilitate bioengineering by providing a richer enzyme resource. Furthermore, our results highlight the necessity of using more advanced computational tools than BLAST in protein database annotations to extract additional biologically relevant functional information from the available biological sequences.

摘要

背景

催化功能的计算预测对于深入理解酶至关重要。由于几种比常用的BLAST工具性能更好的新方法在研究中很少被应用,我们推测已知注释酶的数量与蛋白质宇宙中的实际数量之间存在很大差距,这严重限制了我们从现有测序数据中提取更多生物学相关功能信息的能力。为了可靠地扩展酶空间,我们开发了DomSign,这是一种基于结构域特征的高精度酶功能预测工具,用于分配酶委员会(EC)编号。

结果

DomSign是一种自上而下的预测引擎,当数据库中不存在同一性>30%的同源物时,其产生的结果与许多基准EC编号预测工具(包括BLASTP)的结果相当或更优。性能测试表明,DomSign是一种高度可靠的酶EC编号注释工具。经过多次测试,其准确率被认为大于90%。因此,DomSign可应用于大规模数据集,目标是以高保真度扩展酶空间。使用DomSign,我们成功地将UniProt-TrEMBL中带有EC标签的酶的百分比从12%提高到了30%。在京都基因与基因组百科全书细菌数据库中,每个细菌基因组中带有EC标签的酶的百分比平均可从26.0%提高到33.2%。宏基因组挖掘也很有效,例如将DomSign应用于人类微生物组计划数据集,发现了近100万个新的带有EC标签的酶。

结论

我们的结果初步证实了蛋白质宇宙中存在大量假设的“隐藏酶”,对其进行鉴定可以极大地增进我们对不同生物体代谢的理解,还可以通过提供更丰富的酶资源促进生物工程。此外,我们的结果凸显了在蛋白质数据库注释中使用比BLAST更先进的计算工具以从现有生物序列中提取更多生物学相关功能信息的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/5800f4aea4bb/12859_2015_499_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/52ffa95b9192/12859_2015_499_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/def254f1a6a7/12859_2015_499_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/aecab68bdcee/12859_2015_499_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/15e527cee74d/12859_2015_499_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/221603d37ec4/12859_2015_499_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/5800f4aea4bb/12859_2015_499_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/52ffa95b9192/12859_2015_499_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/def254f1a6a7/12859_2015_499_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/aecab68bdcee/12859_2015_499_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/15e527cee74d/12859_2015_499_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/221603d37ec4/12859_2015_499_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb0c/4389672/5800f4aea4bb/12859_2015_499_Fig6_HTML.jpg

相似文献

1
DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe.DomSign:一种自上而下的注释流程,用于拓展蛋白质世界中的酶空间。
BMC Bioinformatics. 2015 Mar 21;16:96. doi: 10.1186/s12859-015-0499-y.
2
ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot.ENZYMAP:利用蛋白质注释对 UniProt/Swiss-Prot 中的 EC 编号变化进行建模和预测。
PLoS One. 2014 Feb 19;9(2):e89162. doi: 10.1371/journal.pone.0089162. eCollection 2014.
3
Enzyme function prediction using contrastive learning.使用对比学习进行酶功能预测。
Science. 2023 Mar 31;379(6639):1358-1363. doi: 10.1126/science.adf2465. Epub 2023 Mar 30.
4
BENZ WS: the Bologna ENZyme Web Server for four-level EC number annotation.BENZ WS:用于四级 EC 编号注释的博洛尼亚酶网络服务器。
Nucleic Acids Res. 2021 Jul 2;49(W1):W60-W66. doi: 10.1093/nar/gkab328.
5
MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.MG-RAST,一种用于分析微生物群落结构和功能的宏基因组学服务。
Methods Mol Biol. 2016;1399:207-33. doi: 10.1007/978-1-4939-3369-3_13.
6
Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach.利用组合 BLAST 和 MEGAN 方法从 NCBI-nr 数据库构建定制子数据库,快速注释大量宏基因组数据集。
PLoS One. 2013;8(4):e59831. doi: 10.1371/journal.pone.0059831. Epub 2013 Apr 1.
7
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.BrEPS 2.0:用于酶注释的序列模式预测优化
PLoS One. 2017 Jul 27;12(7):e0182216. doi: 10.1371/journal.pone.0182216. eCollection 2017.
8
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.快速鉴定孤儿酶的序列以助力准确的蛋白质注释。
PLoS One. 2013 Dec 30;8(12):e84508. doi: 10.1371/journal.pone.0084508. eCollection 2013.
9
Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations.蛋白质序列注释工具(PSAT):一个基于网络的集中式元服务器,用于高通量序列注释。
BMC Bioinformatics. 2016 Jan 20;17:43. doi: 10.1186/s12859-016-0887-y.
10
Prediction of enzyme function based on 3D templates of evolutionarily important amino acids.基于进化上重要氨基酸的三维模板预测酶的功能。
BMC Bioinformatics. 2008 Jan 11;9:17. doi: 10.1186/1471-2105-9-17.

引用本文的文献

1
Systematic identification and analysis of frequent gene fusion events in metabolic pathways.代谢途径中频繁发生的基因融合事件的系统鉴定与分析。
BMC Genomics. 2016 Jun 24;17:473. doi: 10.1186/s12864-016-2782-3.
2
Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.蛋白质及其相互作用伙伴:蛋白质-配体结合位点预测方法介绍
Int J Mol Sci. 2015 Dec 15;16(12):29829-42. doi: 10.3390/ijms161226202.

本文引用的文献

1
Expanding the Halohydrin Dehalogenase Enzyme Family: Identification of Novel Enzymes by Database Mining.扩展卤代醇脱卤酶家族:通过数据库挖掘鉴定新型酶
Appl Environ Microbiol. 2014 Dec;80(23):7303-15. doi: 10.1128/AEM.01985-14. Epub 2014 Sep 19.
2
XTMS: pathway design in an eXTended metabolic space.XTMS:在扩展代谢空间中的途径设计。
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W389-94. doi: 10.1093/nar/gku362. Epub 2014 May 3.
3
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).
SEED 与利用子系统技术进行快速微生物基因组注释(RAST)。
Nucleic Acids Res. 2014 Jan;42(Database issue):D206-14. doi: 10.1093/nar/gkt1226. Epub 2013 Nov 29.
4
The Structure-Function Linkage Database.结构-功能链接数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D521-30. doi: 10.1093/nar/gkt1130. Epub 2013 Nov 23.
5
Activities at the Universal Protein Resource (UniProt).通用蛋白质资源库(UniProt)的活动。
Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.
6
Prediction and experimental validation of enzyme substrate specificity in protein structures.预测和实验验证蛋白质结构中的酶底物特异性。
Proc Natl Acad Sci U S A. 2013 Nov 5;110(45):E4195-202. doi: 10.1073/pnas.1305162110. Epub 2013 Oct 21.
7
Discovery of new enzymes and metabolic pathways by using structure and genome context.利用结构和基因组背景发现新的酶和代谢途径。
Nature. 2013 Oct 31;502(7473):698-702. doi: 10.1038/nature12576. Epub 2013 Sep 22.
8
A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。
Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.
9
HAMAP in 2013, new developments in the protein family classification and annotation system.HAMAP 于 2013 年,蛋白质家族分类和注释系统的新发展。
Nucleic Acids Res. 2013 Jan;41(Database issue):D584-9. doi: 10.1093/nar/gks1157. Epub 2012 Nov 27.
10
DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more.DcGO:以功能、表型、疾病等为中心的本体数据库。
Nucleic Acids Res. 2013 Jan;41(Database issue):D536-44. doi: 10.1093/nar/gks1080. Epub 2012 Nov 17.