Suppr超能文献

FunFam 蛋白家族可提高残基水平的分子功能预测。

FunFam protein families improve residue level molecular function prediction.

机构信息

Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.

Department of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.

出版信息

BMC Bioinformatics. 2019 Jul 18;20(1):400. doi: 10.1186/s12859-019-2988-x.

Abstract

BACKGROUND

The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues.

RESULTS

FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold.

CONCLUSIONS

The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

摘要

背景

CATH 数据库提供了蛋白质结构域的层次分类,包括将超家族细分为功能家族(FunFams)。我们分析了这些 FunFams 中结合位点注释的相似性,并将 FunFams 纳入到蛋白质结合残基的预测中。

结果

FunFam 成员在其结合残基注释中平均有 36.9%±0.6%是一致的。与随机分组的蛋白质相比,这一比例增加了 6.7 倍,与具有相同酶功能(相同的酶委员会,EC 编号)的蛋白质相比,这一比例增加了 1.2 倍(在相同的数据集上增加了 1.1 倍)。将从头开始的结合残基预测方法(BindPredict-CCS、BindPredict-CC)映射到 FunFam 上,对于那些在 FunFam 内对齐并预测相似(结合/非结合)的残基,得到了共识预测。这种简单的共识将原始预测方法的 F1 得分(用于结合)提高了 1.5 倍。改变共识预测中需要多少个蛋白质达成一致的阈值,可以方便地控制准确性/精度和覆盖率/召回率,例如,对于严格的阈值,可以达到高达 60.8%±0.4%的精度。

结论

在结合位点残基的一致性方面,FunFams 甚至超过了精心 curated 的 EC 编号。此外,我们假设,通过预测蛋白质结合残基来证明这一原理,对于许多其他从 FunFams 中受益于推断残基水平功能信息的解决方案也是相关的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/348d/6639920/ce8518b1e7a2/12859_2019_2988_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验