Suppr超能文献

设计序列在蛋白质结构识别中的应用。

Use of designed sequences in protein structure recognition.

机构信息

Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.

Present address: Institute for Research in Biomedicine (IRB), Parc Cientific de Barcelona, C/ Baldiri Reixac 10, 08028, Barcelona, Spain.

出版信息

Biol Direct. 2018 May 9;13(1):8. doi: 10.1186/s13062-018-0209-6.

Abstract

BACKGROUND

Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure.

RESULTS

We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation.

CONCLUSION

The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable.

REVIEWERS

This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

摘要

背景

了解蛋白质结构是提高对分子功能理解的前提。在后基因组时代,序列-结构空间的差距已经扩大。将相关的蛋白质序列分组到家族中可以帮助缩小差距。在 Pfam 数据库中,为 7726 个家族的部分或全长蛋白质提供了结构描述。对于其余 52%的家族,尚未提供关于 3-D 结构的信息。我们使用与两个已知具有相同折叠的蛋白质结构域家族相关的计算设计序列。这些策略性设计的序列能够检测到遥远的关系,我们在这里将其用于尚未具有已知结构的蛋白质家族的结构识别目的。

结果

我们首先使用具有已知折叠的蛋白质家族数据集来衡量我们方法的成功率,成功率达到 88%。接下来,对于 1392 个未知结构的家族,我们对蛋白质的部分/全长进行了结构分配。为 423 个未知功能(DUFs)的结构域提供了折叠关联,作为功能注释的一个步骤。

结论

结果表明,基于知识的填补蛋白质序列空间中的空白是结构识别的一种有利方法。这些序列有助于遍历蛋白质序列空间,并有效地充当“链接器”,在没有远距离蛋白质之间的自然链接的情况下,这些链接器可以发挥作用。

评论者

本文由 Oliviero Carugo、Christine Orengo 和 Srikrishna Subramanian 进行了评论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/a29e167286f2/13062_2018_209_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验