Suppr超能文献

病毒组学基准测试:对基于宏基因组学的病毒群落组成和多样性估计的评估

Benchmarking viromics: an evaluation of metagenome-enabled estimates of viral community composition and diversity.

作者信息

Roux Simon, Emerson Joanne B, Eloe-Fadrosh Emiley A, Sullivan Matthew B

机构信息

Department of Microbiology, Ohio State University, Columbus, OH, United States of America.

Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America.

出版信息

PeerJ. 2017 Sep 21;5:e3817. doi: 10.7717/peerj.3817. eCollection 2017.

Abstract

BACKGROUND

Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates.

RESULTS

Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2-5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates.

CONCLUSIONS

These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.

摘要

背景

病毒宏基因组学(病毒组学)越来越多地用于获取未培养病毒基因组、评估群落多样性以及检验生态学假设。虽然病毒组学实验方法相对成熟且被研究界广泛接受,但强大的生物信息学标准仍有待确立。在此,我们使用模拟病毒群落来评估病毒组序列到生态推断流程,包括(i)读段预处理和宏基因组组装,(ii)基于读段比对到组装重叠群来估计病毒相对丰度时应用的阈值,以及(iii)应用于病毒相对丰度矩阵以进行α和β多样性估计的归一化方法。

结果

专门为宏基因组设计的工具,特别是metaSPAdes、MEGAHIT和IDBA - UD,在组装病毒组方面最为有效。读段预处理,如分区,对组装输出几乎没有影响,但在硬件受限的情况下可能有用。覆盖度为2 - 5倍的病毒群体通常组装良好,而较低的覆盖度会导致组装片段化。群体内的菌株异质性阻碍了组装,特别是当菌株密切相关(平均核苷酸同一性,或ANI≥97%)且最丰富的菌株占群体比例<50%时。当应用以下检测阈值时,基于读段招募的病毒群落组成评估通常是准确的:(i)≥10 kb的重叠群长度来定义群体,(ii)由同一性≥90%的读段比对定义的覆盖度,以及(iii)≥75%的重叠群长度且覆盖度≥1倍。最后,尽管数据仅限于群落中最丰富的病毒,但在比较测序深度相似的样本时,α和β多样性模式估计稳健(±10%),但当数据集的测序深度不均匀时差异更大(高达80%)。在后一种情况下,使用专门为宏基因组开发的归一化方法能提供最佳估计。

结论

这些模拟为选择分析截止值提供了基准,并确定优化的从样本到生态推断的病毒组学流程对于从自然病毒群落进行生态推断是稳健的。持续发展以更好地获取RNA、稀有和/或多样的病毒群体以及提高参考病毒基因组的可用性将缓解病毒组学的许多剩余局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eac0/5610896/0b2faceeee72/peerj-05-3817-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验