• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对生命之树中采样不足区域未开发的折叠水平新颖性数量的评估。

An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life.

作者信息

Barry Roche Daniel, Brüls Thomas

机构信息

Laboratoire de Génomique et Biochimie du Métabolisme, Genoscope, Institut de Génomique, Commissariat à l'Energie Atomique et aux Energies Alternatives, Evry, Essonne, 91057, France.

UMR 8030 - Génomique Métabolique, Centre National de la Recherche Scientifique, Evry, Essonne, 91057, France.

出版信息

Sci Rep. 2015 Oct 5;5:14717. doi: 10.1038/srep14717.

DOI:10.1038/srep14717
PMID:26434770
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4592975/
Abstract

Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.

摘要

先前对蛋白质折叠空间的研究表明,折叠覆盖率正趋于平稳。然而,序列采样在很大程度上一直存在严重偏差,主要集中在可培养的门类上。持续的技术发展推动了宏基因组学和单细胞测序的出现,这可能会纠正当前的测序偏差。尽管初步结果表明未培养的生物体可能构成新折叠的来源,但这些努力对结构多样性的影响程度仍不清楚。我们研究了通过单细胞测序、宏基因组学和高通量培养获得的未培养和采样不足的门类的基因组在多大程度上有可能增加蛋白质折叠空间,并得出以下结论:i)采样不足的门类的基因组似乎富含当前蛋白质家族和折叠图谱库未涵盖的序列;ii)这种富集与某些数据集中存在过多短(且可能部分是虚假的)序列有关;iii)在当前折叠和家族图谱库未发现的序列中,新折叠的发现率可能高达36%,但最终只会使新折叠的全球发现量略有增加。因此,采样不足的门类的基因组对增加粗粒度三级结构水平的新颖性的影响应该相当有限。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/87db1bc5c107/srep14717-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/b9f62ae0c711/srep14717-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/87db1bc5c107/srep14717-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/b9f62ae0c711/srep14717-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/87db1bc5c107/srep14717-f2.jpg

相似文献

1
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life.对生命之树中采样不足区域未开发的折叠水平新颖性数量的评估。
Sci Rep. 2015 Oct 5;5:14717. doi: 10.1038/srep14717.
2
The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone.仅从超家族水平的结构信息无法可靠推断出一个未知蛋白质序列的酶学性质。
Protein Sci. 2015 May;24(5):643-50. doi: 10.1002/pro.2635. Epub 2015 Jan 28.
3
Cell surface proteins in archaeal and bacterial genomes comprising "LVIVD", "RIVW" and "LGxL" tandem sequence repeats are predicted to fold as beta-propeller.古菌和细菌基因组中包含“LVIVD”、“RIVW”和“LGxL”串联序列重复的细胞表面蛋白预计会折叠成β-螺旋桨结构。
Int J Biol Macromol. 2007 Oct 1;41(4):454-68. doi: 10.1016/j.ijbiomac.2007.06.004. Epub 2007 Jun 17.
4
Conformational analysis of invariant peptide sequences in bacterial genomes.细菌基因组中不变肽序列的构象分析
J Mol Biol. 2005 Feb 4;345(5):937-55. doi: 10.1016/j.jmb.2004.11.008. Epub 2004 Dec 16.
5
On the possible amyloid origin of protein folds.论蛋白质折叠可能的淀粉样起源。
J Mol Biol. 2012 Aug 24;421(4-5):417-26. doi: 10.1016/j.jmb.2012.04.015. Epub 2012 Apr 24.
6
Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site.鉴定具有异常系统基因组分布的新型假定PD-(D/E)XK核酸酶家族及一种新型活性位点。
BMC Genomics. 2005 Feb 18;6:21. doi: 10.1186/1471-2164-6-21.
7
A comprehensive update of the sequence and structure classification of kinases.激酶序列与结构分类的全面更新。
BMC Struct Biol. 2005 Mar 16;5:6. doi: 10.1186/1472-6807-5-6.
8
A path from primary protein sequence to ligand recognition.从蛋白质一级序列到配体识别的一条途径。
Proteins. 2003 Mar 1;50(4):589-99. doi: 10.1002/prot.10316.
9
Tracking polypeptide folds on the free energy surface: effects of the chain length and sequence.追踪自由能表面上的多肽折叠:链长和序列的影响。
J Phys Chem B. 2012 Jul 26;116(29):8703-13. doi: 10.1021/jp300990k. Epub 2012 Jun 11.
10
Topological frustration in beta alpha-repeat proteins: sequence diversity modulates the conserved folding mechanisms of alpha/beta/alpha sandwich proteins.β-发夹重复蛋白中的拓扑学失谐:序列多样性调节α/β/α三明治蛋白的保守折叠机制。
J Mol Biol. 2010 Apr 30;398(2):332-50. doi: 10.1016/j.jmb.2010.03.001. Epub 2010 Mar 11.

引用本文的文献

1
A review of visualisations of protein fold networks and their relationship with sequence and function.蛋白质折叠网络可视化及其与序列和功能关系的综述。
Biol Rev Camb Philos Soc. 2023 Feb;98(1):243-262. doi: 10.1111/brv.12905. Epub 2022 Oct 9.
2
Benchmarking the next generation of homology inference tools.对下一代同源性推断工具进行基准测试。
Bioinformatics. 2016 Sep 1;32(17):2636-41. doi: 10.1093/bioinformatics/btw305. Epub 2016 Jun 1.

本文引用的文献

1
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.蛋白质宇宙结构覆盖范围的趋势以及蛋白质结构计划的影响。
Proc Natl Acad Sci U S A. 2014 Mar 11;111(10):3733-8. doi: 10.1073/pnas.1321614111. Epub 2014 Feb 24.
2
SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures.SCOPe:蛋白质结构分类——扩展版,整合了 SCOP 和 ASTRAL 数据以及新结构的分类。
Nucleic Acids Res. 2014 Jan;42(Database issue):D304-9. doi: 10.1093/nar/gkt1240. Epub 2013 Dec 3.
3
Pfam: the protein families database.
Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
4
An estimated 5% of new protein structures solved today represent a new Pfam family.据估计,如今解析出的新蛋白质结构中,约5%代表一个新的 Pfam 家族。
Acta Crystallogr D Biol Crystallogr. 2013 Nov;69(Pt 11):2186-93. doi: 10.1107/S0907444913027157. Epub 2013 Oct 12.
5
PDBsum additions.PDBsum 新增内容。
Nucleic Acids Res. 2014 Jan;42(Database issue):D292-6. doi: 10.1093/nar/gkt940. Epub 2013 Oct 22.
6
Insights into the phylogeny and coding potential of microbial dark matter.微生物暗物质的系统发育和编码潜力的研究进展
Nature. 2013 Jul 25;499(7459):431-7. doi: 10.1038/nature12352. Epub 2013 Jul 14.
7
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.CATH 中的新功能家族(FunFams),以改进将保守功能位点映射到 3D 结构的工作。
Nucleic Acids Res. 2013 Jan;41(Database issue):D490-8. doi: 10.1093/nar/gks1211. Epub 2012 Nov 29.
8
Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla.多种未培养细菌门中的发酵、产氢和硫代谢。
Science. 2012 Sep 28;337(6102):1661-5. doi: 10.1126/science.1224041.
9
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.HHblits:通过 HMM-HMM 比对进行快速迭代的蛋白质序列搜索。
Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.
10
Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。
PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.