Suppr超能文献

CAZymes 分析工具包 (CAT):一种网络服务,可使用 CAZy 数据库搜索和分析新测序生物中的碳水化合物活性酶。

CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

机构信息

Computer Science and Mathematics Division.

出版信息

Glycobiology. 2010 Dec;20(12):1574-84. doi: 10.1093/glycob/cwq106. Epub 2010 Aug 9.

Abstract

The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

摘要

碳水化合物活性酶(CAZy)数据库提供了丰富的手动注释酶,这些酶可降解、修饰或生成糖苷键。尽管数据库中存储了丰富而有价值的信息,但利用这些信息通过 CAZy 家族对新测序基因组进行注释的软件工具仍然有限。我们采用了两种注释方法来填补 CAZy 数据库中精心收集的高质量蛋白质序列与基因组或宏基因组测序项目产生的其他蛋白质序列数量不断增加之间的空白。第一种方法是基于与 CAZy 数据库中所有非冗余序列的相似性搜索。第二种方法使用 CAZy 家族和蛋白质家族结构域之间的链接或对应关系进行注释。这些链接是使用关联规则学习算法在 CAZy 数据库中的序列上发现的。这两种方法相互补充,当与经过精心编辑的梭菌属热纤梭菌 ATCC 27405 和降解球腔菌 2-40 的基因组进行交叉评估时,它们实现了高特异性和灵敏度。该框架预测未知蛋白质结构域和粗糙脉孢菌基因组中假设蛋白质功能的能力得到了证明。该框架实现为 Web 服务,即 CAZymes 分析工具包,可在 http://cricket.ornl.gov/cgi-bin/cat.cgi 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验