Suppr超能文献

利用保守结构域数据库进行功能位点注释。

Annotation of functional sites with the Conserved Domain Database.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38 A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

Database (Oxford). 2012 Mar 20;2012:bar058. doi: 10.1093/database/bar058. Print 2012.

Abstract

The overwhelming fraction of proteins whose sequences have been collected in comprehensive databases may never be assessed for function experimentally. Commonly, putative function is assigned based on similarity to experimentally characterized homologs, either on the level of the entire protein or for single evolutionarily conserved domains. The annotation of individual sites provides more detailed insights regarding the correspondence between sequence and function, as well as context for the interpretation of sequence variation and the outcomes of experiments. In general, site annotation has to be extracted from the published literature, and can often be transferred to closely related sequence neighbors. The National Center for Biotechnology Information's Conserved Domain Database (CDD) provides a system for curators to record functional (such as active sites or binding sites for cofactors) or characteristic sites (such as signature motifs), which are conserved across domain families, and for the transfer of that annotation to protein database sequences via high-confidence domain matches. Recently, CDD curators have begun to sort-site annotations into seven categories (active, polypeptide binding, nucleic acid binding, ion binding, chemical binding, post-translational modification and other) and here we present a first comparative analysis of sites obtained via domain model matches, juxtaposed with existing site annotation encountered in high-quality data sets. Site annotation derived from domain annotation has the potential to cover large fractions of protein sequences, and we observe that CDD-based site annotation complements existing site annotation in many cases, which may, in part, originate from CDD's curation practice of collecting sites conserved across diverse taxa and supported by evidence from multiple 3D structures.

摘要

在综合数据库中收集的蛋白质序列中,绝大多数蛋白质可能从未通过实验评估其功能。通常,根据与经过实验验证的同源物的相似性,基于整个蛋白质或单个进化保守结构域来推测其功能。对单个位点的注释可以提供关于序列与功能之间对应关系的更详细的见解,以及解释序列变异和实验结果的上下文。通常,必须从已发表的文献中提取位点注释,并且通常可以将其转移到密切相关的序列邻居。美国国立生物技术信息中心(NCBI)的保守结构域数据库(CDD)为管理员提供了一种记录功能(如活性位点或辅助因子结合位点)或特征性位点(如签名基序)的系统,这些功能在结构域家族之间是保守的,并通过高可信度的结构域匹配将该注释转移到蛋白质数据库序列中。最近,CDD 的管理员开始将位点注释分为七类(活性、多肽结合、核酸结合、离子结合、化学结合、翻译后修饰和其他),在这里,我们首次对通过结构域模型匹配获得的位点进行了比较分析,将其与高质量数据集中原有的位点注释进行了对比。基于结构域注释的位点注释有可能覆盖很大一部分蛋白质序列,我们观察到 CDD 基于的位点注释在许多情况下补充了现有的位点注释,这可能部分源于 CDD 收集跨越不同分类群且得到多个 3D 结构证据支持的保守位点的管理实践。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5166/3308149/1909d6c22ceb/bar058f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验