• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

设计序列在蛋白质结构识别中的应用。

Use of designed sequences in protein structure recognition.

机构信息

Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.

Present address: Institute for Research in Biomedicine (IRB), Parc Cientific de Barcelona, C/ Baldiri Reixac 10, 08028, Barcelona, Spain.

出版信息

Biol Direct. 2018 May 9;13(1):8. doi: 10.1186/s13062-018-0209-6.

DOI:10.1186/s13062-018-0209-6
PMID:29776380
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5960202/
Abstract

BACKGROUND

Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure.

RESULTS

We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation.

CONCLUSION

The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable.

REVIEWERS

This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

摘要

背景

了解蛋白质结构是提高对分子功能理解的前提。在后基因组时代,序列-结构空间的差距已经扩大。将相关的蛋白质序列分组到家族中可以帮助缩小差距。在 Pfam 数据库中,为 7726 个家族的部分或全长蛋白质提供了结构描述。对于其余 52%的家族,尚未提供关于 3-D 结构的信息。我们使用与两个已知具有相同折叠的蛋白质结构域家族相关的计算设计序列。这些策略性设计的序列能够检测到遥远的关系,我们在这里将其用于尚未具有已知结构的蛋白质家族的结构识别目的。

结果

我们首先使用具有已知折叠的蛋白质家族数据集来衡量我们方法的成功率,成功率达到 88%。接下来,对于 1392 个未知结构的家族,我们对蛋白质的部分/全长进行了结构分配。为 423 个未知功能(DUFs)的结构域提供了折叠关联,作为功能注释的一个步骤。

结论

结果表明,基于知识的填补蛋白质序列空间中的空白是结构识别的一种有利方法。这些序列有助于遍历蛋白质序列空间,并有效地充当“链接器”,在没有远距离蛋白质之间的自然链接的情况下,这些链接器可以发挥作用。

评论者

本文由 Oliviero Carugo、Christine Orengo 和 Srikrishna Subramanian 进行了评论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/f3a595ddc4b8/13062_2018_209_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/a29e167286f2/13062_2018_209_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/1d1a084dd905/13062_2018_209_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/f3a595ddc4b8/13062_2018_209_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/a29e167286f2/13062_2018_209_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/1d1a084dd905/13062_2018_209_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d78/5960202/f3a595ddc4b8/13062_2018_209_Fig3_HTML.jpg

相似文献

1
Use of designed sequences in protein structure recognition.设计序列在蛋白质结构识别中的应用。
Biol Direct. 2018 May 9;13(1):8. doi: 10.1186/s13062-018-0209-6.
2
De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods.去除未知功能结构域中的冗余:使用灵敏的同源性检测方法解析未知功能结构域的远缘进化关系。
Biol Direct. 2015 Jul 31;10:38. doi: 10.1186/s13062-015-0069-2.
3
Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection.天然和设计的类似蛋白质序列的特征有效地填补了蛋白质序列缺口:在远距离同源性检测中的意义。
Methods Mol Biol. 2022;2449:149-167. doi: 10.1007/978-1-0716-2095-3_5.
4
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.通过类似蛋白质的人工序列填补蛋白质序列空间中的空白和稀疏区域,可以显著提高远程同源检测能力。
J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.
5
NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.NrichD数据库:富含通过计算设计的类蛋白质序列的序列数据库有助于远程同源性检测。
Nucleic Acids Res. 2015 Jan;43(Database issue):D300-5. doi: 10.1093/nar/gku888. Epub 2014 Sep 27.
6
Artificial protein sequences enable recognition of vicinal and distant protein functional relationships.人工蛋白质序列可识别邻近和遥远的蛋白质功能关系。
Proteins. 2020 Dec;88(12):1688-1700. doi: 10.1002/prot.25986. Epub 2020 Aug 31.
7
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
8
SUPFAM: a database of sequence superfamilies of protein domains.SUPFAM:一个蛋白质结构域序列超家族数据库。
BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.
9
The PAS fold. A redefinition of the PAS domain based upon structural prediction.PAS结构域。基于结构预测对PAS结构域的重新定义。
Eur J Biochem. 2004 Mar;271(6):1198-208. doi: 10.1111/j.1432-1033.2004.04023.x.
10
A comparison of sequence and structure protein domain families as a basis for structural genomics.作为结构基因组学基础的序列与结构蛋白质结构域家族比较。
Bioinformatics. 1999 Jun;15(6):480-500. doi: 10.1093/bioinformatics/15.6.480.

引用本文的文献

1
Identification of novel salt tolerance-associated proteins from the secretome of Enterococcus faecalis.从粪肠球菌分泌物中鉴定新型耐盐相关蛋白。
World J Microbiol Biotechnol. 2022 Aug 8;38(10):177. doi: 10.1007/s11274-022-03354-w.
2
CCDC66 frameshift variant associated with a new form of early-onset progressive retinal atrophy in Portuguese Water Dogs.CCDC66 移码变异与葡萄牙水犬中一种新形式的早发性进行性视网膜萎缩有关。
Sci Rep. 2020 Dec 3;10(1):21162. doi: 10.1038/s41598-020-77980-5.
3
Fold combinations in multi-domain proteins.

本文引用的文献

1
I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation.I-TASSER-MR:利用迭代片段组装和渐进序列截断的远距离同源蛋白自动化分子替换。
Nucleic Acids Res. 2017 Jul 3;45(W1):W429-W434. doi: 10.1093/nar/gkx349.
2
cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination.cryoSPARC:用于快速无监督低温电子显微镜结构测定的算法。
Nat Methods. 2017 Mar;14(3):290-296. doi: 10.1038/nmeth.4169. Epub 2017 Feb 6.
3
SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.
多结构域蛋白质中的折叠组合。
Bioinformation. 2019 May 15;15(5):342-350. doi: 10.6026/97320630015342. eCollection 2019.
SCOPe:蛋白质结构分类中的人工整理与伪迹去除——扩展数据库
J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.
4
Coming of age: ten years of next-generation sequencing technologies.成年:下一代测序技术的十年
Nat Rev Genet. 2016 May 17;17(6):333-51. doi: 10.1038/nrg.2016.49.
5
Protein sequence design and its applications.蛋白质序列设计及其应用。
Curr Opin Struct Biol. 2016 Apr;37:71-80. doi: 10.1016/j.sbi.2015.12.004. Epub 2016 Jan 8.
6
Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta.通过将协同进化信息整合到Rosetta中,改进了CASP11中的从头结构预测。
Proteins. 2016 Sep;84 Suppl 1(Suppl 1):67-75. doi: 10.1002/prot.24974. Epub 2016 Feb 24.
7
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
8
De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods.去除未知功能结构域中的冗余:使用灵敏的同源性检测方法解析未知功能结构域的远缘进化关系。
Biol Direct. 2015 Jul 31;10:38. doi: 10.1186/s13062-015-0069-2.
9
The Phyre2 web portal for protein modeling, prediction and analysis.用于蛋白质建模、预测和分析的Phyre2网络门户。
Nat Protoc. 2015 Jun;10(6):845-58. doi: 10.1038/nprot.2015.053. Epub 2015 May 7.
10
NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.NrichD数据库:富含通过计算设计的类蛋白质序列的序列数据库有助于远程同源性检测。
Nucleic Acids Res. 2015 Jan;43(Database issue):D300-5. doi: 10.1093/nar/gku888. Epub 2014 Sep 27.