Suppr超能文献

Genovo:宏基因组的从头组装

Genovo: de novo assembly for metagenomes.

作者信息

Laserson Jonathan, Jojic Vladimir, Koller Daphne

机构信息

Department of Computer Science, Stanford University, Stanford, California, USA.

出版信息

J Comput Biol. 2011 Mar;18(3):429-43. doi: 10.1089/cmb.2010.0244.

Abstract

Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A nonparametric prior accounts for the unknown number of genomes in the sample. Inference is performed by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs in a series of synthetic experiments and across nine metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, even for low-abundance sequences, and yield a higher assembly score. Supplementary Material is available at www.liebertoinline.com/cmb .

摘要

下一代测序技术会从样本中的DNA产生大量有噪声的读数。宏基因组学和群体测序旨在恢复样本中物种的基因组序列,这些物种可能具有高度的多样性。在这种情况下应用时,针对单序列重建的方法不够灵敏。我们引入了一种从环境样本生成读数的生成概率模型,并提出了Genovo,这是一种新型的从头序列组装器,它能在该模型下发现可能的序列重建。一个非参数先验考虑了样本中未知数量的基因组。通过迭代应用一系列爬山步骤直到收敛来进行推断。在一系列合成实验中以及跨越使用454平台创建的九个宏基因组数据集(其中最大的数据集有311k个读数),我们将Genovo的性能与其他三个短读组装程序进行了比较。即使对于低丰度序列,Genovo的重建也比其他方法覆盖更多碱基并恢复更多基因,并且产生更高的组装分数。补充材料可在www.liebertoinline.com/cmb获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验