Suppr超能文献

基于高通量宏基因组测序reads 的精确基因组相对丰度估计

Accurate genome relative abundance estimation based on shotgun metagenomic reads.

机构信息

Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America.

出版信息

PLoS One. 2011;6(12):e27992. doi: 10.1371/journal.pone.0027992. Epub 2011 Dec 6.

Abstract

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

摘要

基于宏基因组测序数据准确估计微生物群落组成是后续宏基因组分析的基础。流行的估计方法主要基于直接总结比对结果或其变体,往往导致有偏差和/或不稳定的估计。我们通过明确建模读取分配歧义、基因组大小偏差和读取在基因组上的分布,开发了一个统一的概率框架(命名为 GRAMMy)。最大似然法被用于使用混合模型理论(GRAMMy)计算微生物群落的基因组相对丰度。GRAMMy 已被证明在模拟和真实读取基准数据集上都能给出准确和稳健的估计。我们将 GRAMMy 应用于来自四个宏基因组项目的 34 个宏基因组读取集,并在人类肠道样本中鉴定出 99 个常见物种(至少在 50%的数据集中丰度至少为 0.5%)。我们的结果表明,与之前的研究相比有了很大的改进,例如通过提供一种新的基于参考的宏基因组样本比较策略,调整了人类肠道样本中拟杆菌属物种的高估丰度。GRAMMy 可以与许多读取分配工具(映射、比对或基于组成)灵活使用,即使是来自庞大的短读数据集的低灵敏度映射结果。随着读取集的大小不断增加和参考基因组数据库的不断扩大,它将成为一种越来越有用的准确和稳健的丰度估计工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e29/3232206/141147389bb1/pone.0027992.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验