Suppr超能文献

EC结构域挖掘器:发现酶委员会编号与Pfam结构域之间的隐藏关联。

ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains.

作者信息

Alborzi Seyed Ziaeddin, Devignes Marie-Dominique, Ritchie David W

机构信息

Université de Lorraine, LORIA, UMR, Vandœuvre-lès-Nancy, 7503, 54506, France.

Inria Nancy Grand-Es, Villers-lès-Nancy, 54600, France.

出版信息

BMC Bioinformatics. 2017 Feb 13;18(1):107. doi: 10.1186/s12859-017-1519-x.

Abstract

BACKGROUND

Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.

RESULTS

This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.

CONCLUSION

These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .

摘要

背景

蛋白质数据库(PDB)中的许多条目都根据Pfam分类法注释了其组成蛋白结构域,并通过酶学委员会(EC)编号系统注释了其生物学功能。然而,尽管许多蛋白质的生物学活性通常源于特定的结构域-结构域和结构域-配体相互作用,但目前的在线资源很少提供从结构到结构域水平功能的直接映射。由于PDB现在包含数以万计的蛋白质链,而且蛋白质序列数据库的数量可能比这个数量级大得多,因此迫切需要开发能够在结构域水平上运行的自动结构-功能注释工具。

结果

本文介绍了ECDomainMiner,这是一种基于内容的新型过滤方法,用于自动推断EC编号和Pfam结构域之间的关联。相对于从InterPro提取的“金标准”测试集,ECDomainMiner总共发现了20728个非冗余的EC-Pfam关联,F值为0.95。与InterPro中1515个手动策划的EC-Pfam关联相比,ECDomainMiner推断的EC-Pfam关联数量增加了13倍。

结论

这些EC-Pfam关联可用于注释PDB中目前缺乏任何EC注释的约58722条蛋白质链。ECDomainMiner数据库可在http://ecdm.loria.fr/ 上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/5307852/105971b82fb1/12859_2017_1519_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验