Suppr超能文献

CATH:增加功能空间的结构覆盖率。

CATH: increased structural coverage of functional space.

机构信息

Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.

Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia.

出版信息

Nucleic Acids Res. 2021 Jan 8;49(D1):D266-D273. doi: 10.1093/nar/gkaa1079.

Abstract

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

摘要

CATH(https://www.cathdb.info)从 wwPDB 中识别蛋白质结构中的结构域,并将其分类为进化超家族,从而提供结构和功能注释。有两个级别:CATH-B,是最新结构域结构和超家族分配的每日快照,以及 CATH+,具有其他衍生数据,如预测的序列结构域和功能一致的序列子集(功能家族或 FunFams)。最新的 CATH+版本 4.3 显著增加了结构和序列数据的覆盖范围,增加了 65,351 个完全分类的结构域结构(增加 15%),提供了 500,238 个结构域和 1.51 亿个预测的序列结构域(增加 59%)分配给 5481 个超家族。FunFam 生成管道已重新设计以应对数据的大量涌入。在 FunFams 中捕获的序列增加了三倍,同时功能纯度、信息量和结构覆盖度也相应增加。FunFam 扩展增加了为实验 GO 术语提供的结构注释(增加 59%)。我们还展示了 CATH-FunVar 网页,显示蛋白质序列的变化及其与已知或预测功能位点的接近程度。我们展示了两个案例研究(1)潜在的癌症驱动因子和(2)SARS-CoV-2 蛋白。最后,我们改进了与 CATH 的链接,包括 SCOP、InterPro、Aquaria 和 2DProt。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验