Suppr超能文献

用统计手术刀剖析蛋白质环,提示了某些结构模体的功能意义。

Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs.

机构信息

INSERM, U973, Paris F-75013, France.

出版信息

BMC Bioinformatics. 2011 Jun 20;12:247. doi: 10.1186/1471-2105-12-247.

Abstract

BACKGROUND

One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function.

RESULTS

Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM.

CONCLUSIONS

Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.

摘要

背景

蛋白质功能注释的策略之一是搜索特定的结构基序,这些基序已知在具有给定功能的蛋白质中共享。

结果

在这里,我们系统地从蛋白质环中提取了七个残基的结构基序,并探索了它们与功能位点的对应关系。我们的方法基于 HMM-SA(隐马尔可夫模型-结构字母)的结构字母,它允许将蛋白质结构简化为一维序列,并采用先进的模式统计方法来处理短序列。通过寻找在 SCOP 超家族中显著过度表达的结构基序来选择感兴趣的结构基序。我们发现了两种在 SCOP 超家族中显著过度表达的结构基序:(i)普遍存在的基序,被几个超家族共享;(ii)超家族特异性基序,在少数超家族中过度表达。与已知的小结构基序相比,普遍存在的基序包含描述良好的基序,如转角、小生境或巢基序。将超家族特异性基序与 Swiss-Prot 的生物学注释进行比较表明,其中一些实际上与小分子配体结合位点的功能位点相对应,如 ATP/GTP、NAD(P) 和 SAH/SAM。

结论

我们的发现表明,在 SCOP 超家族中的统计过度表达与功能特征有关。因此,通过 HMM-SA 简化结构中检测过度表达的基序是预测功能位点和注释未表征蛋白质的一种很有前途的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4222/3158783/ec109ab930e9/1471-2105-12-247-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验