Suppr超能文献

具有精确控制的全基因组酶注释:催化家族(CatFam)数据库。

Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.

作者信息

Yu Chenggang, Zavaljevski Nela, Desai Valmik, Reifman Jaques

机构信息

Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Fort Detrick, MD 21702-5012, USA.

出版信息

Proteins. 2009 Feb 1;74(2):449-60. doi: 10.1002/prot.22167.

Abstract

In this article, we present a new method termed CatFam (Catalytic Families) to automatically infer the functions of catalytic proteins, which account for 20-40% of all proteins in living organisms and play a critical role in a variety of biological processes. CatFam is a sequence-based method that generates sequence profiles to represent and infer protein catalytic functions. CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile-specific thresholds associated with a predefined nominal false-positive rate (FPR) of predictions. The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs: function annotation with high precision and hypothesis generation with moderate precision but better recall. Multiple tests of CatFam databases (generated with distinct nominal FPRs) against enzyme and nonenzyme datasets show that the method's predictions have consistently high precision and recall. For example, a 1% FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98.6% precision and 95.0% recall. Comparisons of CatFam databases against other established profile-based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and (in most cases) higher recall, and that (on average) CatFam provides 21.9% additional catalytic functions not inferred by the other similarly reliable methods. These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions. The CatFam databases and the database search program are freely available at http://www.bhsai.org/downloads/catfam.tar.gz.

摘要

在本文中,我们提出了一种名为CatFam(催化家族)的新方法,用于自动推断催化蛋白的功能。催化蛋白占生物体中所有蛋白的20%-40%,在各种生物过程中发挥着关键作用。CatFam是一种基于序列的方法,它生成序列谱来表示和推断蛋白质的催化功能。CatFam通过一个逐步的过程生成谱,该过程仔细控制谱的质量,并使用非酶作为阴性样本,以建立与预定义的预测名义假阳性率(FPR)相关的谱特异性阈值。可调节的FPR允许对每个谱进行精细的精度控制,并能够生成满足不同需求的谱数据库:高精度的功能注释和中等精度但召回率更高的假设生成。针对酶和非酶数据集对CatFam数据库(使用不同的名义FPR生成)进行的多次测试表明,该方法的预测始终具有较高的精度和召回率。例如,一个1%FPR的数据库对酶和非酶数据集预测蛋白质催化功能的精度为98.6%,召回率为95.0%。将CatFam数据库与其他已建立的基于谱的方法对13个细菌基因组进行功能注释的结果进行比较,表明CatFam始终能实现更高的精度,并且(在大多数情况下)具有更高的召回率,而且(平均而言)CatFam提供了其他类似可靠方法未推断出的21.9%的额外催化功能。这些结果有力地表明,所提出的方法为蛋白质催化功能的自动预测做出了有价值的贡献。CatFam数据库和数据库搜索程序可在http://www.bhsai.org/downloads/catfam.tar.gz免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验