具有精确控制的全基因组酶注释：催化家族（CatFam）数据库。

Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.

作者信息

Yu Chenggang, Zavaljevski Nela, Desai Valmik, Reifman Jaques

机构信息

Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Fort Detrick, MD 21702-5012, USA.

出版信息

Proteins. 2009 Feb 1;74(2):449-60. doi: 10.1002/prot.22167.

DOI:10.1002/prot.22167

PMID:18636476

Abstract

In this article, we present a new method termed CatFam (Catalytic Families) to automatically infer the functions of catalytic proteins, which account for 20-40% of all proteins in living organisms and play a critical role in a variety of biological processes. CatFam is a sequence-based method that generates sequence profiles to represent and infer protein catalytic functions. CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile-specific thresholds associated with a predefined nominal false-positive rate (FPR) of predictions. The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs: function annotation with high precision and hypothesis generation with moderate precision but better recall. Multiple tests of CatFam databases (generated with distinct nominal FPRs) against enzyme and nonenzyme datasets show that the method's predictions have consistently high precision and recall. For example, a 1% FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98.6% precision and 95.0% recall. Comparisons of CatFam databases against other established profile-based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and (in most cases) higher recall, and that (on average) CatFam provides 21.9% additional catalytic functions not inferred by the other similarly reliable methods. These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions. The CatFam databases and the database search program are freely available at http://www.bhsai.org/downloads/catfam.tar.gz.

摘要

在本文中，我们提出了一种名为CatFam（催化家族）的新方法，用于自动推断催化蛋白的功能。催化蛋白占生物体中所有蛋白的20%-40%，在各种生物过程中发挥着关键作用。CatFam是一种基于序列的方法，它生成序列谱来表示和推断蛋白质的催化功能。CatFam通过一个逐步的过程生成谱，该过程仔细控制谱的质量，并使用非酶作为阴性样本，以建立与预定义的预测名义假阳性率（FPR）相关的谱特异性阈值。可调节的FPR允许对每个谱进行精细的精度控制，并能够生成满足不同需求的谱数据库：高精度的功能注释和中等精度但召回率更高的假设生成。针对酶和非酶数据集对CatFam数据库（使用不同的名义FPR生成）进行的多次测试表明，该方法的预测始终具有较高的精度和召回率。例如，一个1%FPR的数据库对酶和非酶数据集预测蛋白质催化功能的精度为98.6%，召回率为95.0%。将CatFam数据库与其他已建立的基于谱的方法对13个细菌基因组进行功能注释的结果进行比较，表明CatFam始终能实现更高的精度，并且（在大多数情况下）具有更高的召回率，而且（平均而言）CatFam提供了其他类似可靠方法未推断出的21.9%的额外催化功能。这些结果有力地表明，所提出的方法为蛋白质催化功能的自动预测做出了有价值的贡献。CatFam数据库和数据库搜索程序可在http://www.bhsai.org/downloads/catfam.tar.gz免费获取。

相似文献

Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.具有精确控制的全基因组酶注释：催化家族（CatFam）数据库。

Proteins. 2009 Feb 1;74(2):449-60. doi: 10.1002/prot.22167.

The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation.PIPA的开发：一种用于全基因组蛋白质功能注释的集成自动化流程

BMC Bioinformatics. 2008 Jan 25;9:52. doi: 10.1186/1471-2105-9-52.

Automatic annotation of protein function based on family identification.基于家族识别的蛋白质功能自动注释。

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

Filtering erroneous protein annotation.过滤错误的蛋白质注释。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. doi: 10.1093/bioinformatics/bth938.

The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis.博洛尼亚注释资源：一种基于大规模比较基因组分析的蛋白质序列功能和结构注释的非分层方法。

J Proteome Res. 2009 Sep;8(9):4362-71. doi: 10.1021/pr900204r.

Accurate sequence-based prediction of catalytic residues.基于序列的催化残基精确预测。

Bioinformatics. 2008 Oct 15;24(20):2329-38. doi: 10.1093/bioinformatics/btn433. Epub 2008 Aug 18.

Vector-G: multi-modular SVM-based heterotrimeric G protein prediction.Vector-G：基于多模块支持向量机的异源三聚体G蛋白预测

In Silico Biol. 2008;8(2):141-55.

Prediction of signal peptides in protein sequences by neural networks.利用神经网络预测蛋白质序列中的信号肽。

Acta Biochim Pol. 2008;55(2):261-7. Epub 2008 May 26.

Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms.优化序列轮廓的大小，以提高由轮廓-轮廓算法生成的蛋白质序列比对的准确性。

Bioinformatics. 2008 May 1;24(9):1145-53. doi: 10.1093/bioinformatics/btn097. Epub 2008 Mar 12.

PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.PFP：利用蛋白质序列数据自动预测具有置信度分数的基因本体功能注释。

Proteins. 2009 Feb 15;74(3):566-82. doi: 10.1002/prot.22172.

引用本文的文献

Annotating the microbial dark matter with HiFi-NN.用HiFi-NN注释微生物暗物质。

iScience. 2025 Apr 18;28(6):112480. doi: 10.1016/j.isci.2025.112480. eCollection 2025 Jun 20.

Exploring the enzymatic repertoires of Bacteria and Archaea and their associations with metabolic maps.探索细菌和古菌的酶库及其与代谢图谱的关联。

Braz J Microbiol. 2024 Dec;55(4):3147-3157. doi: 10.1007/s42770-024-01462-3. Epub 2024 Jul 25.

Machine Learning and Deep Learning in Synthetic Biology: Key Architectures, Applications, and Challenges.合成生物学中的机器学习与深度学习：关键架构、应用及挑战

ACS Omega. 2024 Feb 19;9(9):9921-9945. doi: 10.1021/acsomega.3c05913. eCollection 2024 Mar 5.

Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework.基于分层双核多任务学习框架的酶委员会编号预测与基准测试

Research (Wash D C). 2023 May 31;6:0153. doi: 10.34133/research.0153. eCollection 2023.

Improving automatic GO annotation with semantic similarity.利用语义相似度提高 GO 自动注释的效果。

BMC Bioinformatics. 2022 Dec 12;23(Suppl 2):433. doi: 10.1186/s12859-022-04958-7.

Synthetic Biology Meets Machine Learning.合成生物学与机器学习相遇。

Methods Mol Biol. 2023;2553:21-39. doi: 10.1007/978-1-0716-2617-7_2.

Architect: A tool for aiding the reconstruction of high-quality metabolic models through improved enzyme annotation.架构师：一种通过改进酶注释来帮助重建高质量代谢模型的工具。

PLoS Comput Biol. 2022 Sep 8;18(9):e1010452. doi: 10.1371/journal.pcbi.1010452. eCollection 2022 Sep.

Genomic and Phenotypic Characterization of Bacteriophages Isolated from Acne Patients.从痤疮患者中分离出的噬菌体的基因组和表型特征

Antibiotics (Basel). 2022 Aug 2;11(8):1041. doi: 10.3390/antibiotics11081041.

Comparative genomics of DNA-binding transcription factors in archaeal and bacterial organisms.古菌和细菌中 DNA 结合转录因子的比较基因组学。

PLoS One. 2021 Jul 2;16(7):e0254025. doi: 10.1371/journal.pone.0254025. eCollection 2021.

Deciphering the functional diversity of DNA-binding transcription factors in Bacteria and Archaea organisms.解析细菌和古菌生物中 DNA 结合转录因子的功能多样性。

PLoS One. 2020 Aug 21;15(8):e0237135. doi: 10.1371/journal.pone.0237135. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

具有精确控制的全基因组酶注释：催化家族（CatFam）数据库。

Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献