Suppr超能文献

对生命之树中采样不足区域未开发的折叠水平新颖性数量的评估。

An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life.

作者信息

Barry Roche Daniel, Brüls Thomas

机构信息

Laboratoire de Génomique et Biochimie du Métabolisme, Genoscope, Institut de Génomique, Commissariat à l'Energie Atomique et aux Energies Alternatives, Evry, Essonne, 91057, France.

UMR 8030 - Génomique Métabolique, Centre National de la Recherche Scientifique, Evry, Essonne, 91057, France.

出版信息

Sci Rep. 2015 Oct 5;5:14717. doi: 10.1038/srep14717.

Abstract

Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.

摘要

先前对蛋白质折叠空间的研究表明,折叠覆盖率正趋于平稳。然而,序列采样在很大程度上一直存在严重偏差,主要集中在可培养的门类上。持续的技术发展推动了宏基因组学和单细胞测序的出现,这可能会纠正当前的测序偏差。尽管初步结果表明未培养的生物体可能构成新折叠的来源,但这些努力对结构多样性的影响程度仍不清楚。我们研究了通过单细胞测序、宏基因组学和高通量培养获得的未培养和采样不足的门类的基因组在多大程度上有可能增加蛋白质折叠空间,并得出以下结论:i)采样不足的门类的基因组似乎富含当前蛋白质家族和折叠图谱库未涵盖的序列;ii)这种富集与某些数据集中存在过多短(且可能部分是虚假的)序列有关;iii)在当前折叠和家族图谱库未发现的序列中,新折叠的发现率可能高达36%,但最终只会使新折叠的全球发现量略有增加。因此,采样不足的门类的基因组对增加粗粒度三级结构水平的新颖性的影响应该相当有限。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148d/4592975/b9f62ae0c711/srep14717-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验