Suppr超能文献

量化和编目人类微生物组中的未知序列。

Quantifying and Cataloguing Unknown Sequences within Human Microbiomes.

机构信息

MRC University of Glasgowgrid.8756.c Centre for Virus Research, Glasgow, United Kingdom.

出版信息

mSystems. 2022 Apr 26;7(2):e0146821. doi: 10.1128/msystems.01468-21. Epub 2022 Mar 8.

Abstract

Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called "dark matter" is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified.

摘要

基因组测序技术的进步和成本的降低使得人们能够探索众多已知和新型环境和微生物组。这导致在线存储库中存储的原始序列数据呈指数级增长。宏基因组和宏转录组数据集通常针对特定的生物学问题进行分析。然而,人们普遍认识到,这些数据集包含一部分与任何当前已知的生物序列都没有相似性的序列,这些所谓的“暗物质”通常被排除在下游分析之外。在这项研究中,开发了一种系统框架,用于组装、识别和衡量不同人类微生物组中存在的未知序列的比例。该框架应用于 40 项不同的研究,包含 963 个样本,涵盖了 10 种不同的人类微生物组,包括粪便、口腔、肺部、皮肤和循环系统微生物组。我们发现,尽管人类微生物组是研究最多的微生物组之一,但平均有 2%的组装序列尚未进行分类学定义。然而,不同微生物组之间的比例差异很大,皮肤和口腔微生物组与环境的相互作用更多,这一比例高达 25%。根据本研究中发现的这些未分类的未知序列,计算出每月对未知序列进行分类特征描述的速率为 1.64%。对不同样本和/或微生物组中相似的未知序列进行跨研究比较,确定了这些序列。我们的计算框架和发现的新未知序列都可供未来交叉参考使用。我们的方法导致发现了一些与公共数据库中的序列没有相似性的新型病毒基因组。其中一些是广泛存在的,因为它们已经在不同的微生物组和研究中被发现。因此,我们的研究说明了系统地描述未知序列如何帮助发现新的微生物,我们呼吁研究界系统地整理和共享宏基因组研究中的未知序列,以提高未知序列空间的分类速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7efc/9052204/796f5fb0237d/msystems.01468-21-f001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验