使用模拟数据集评估宏基因组学处理方法的保真度。

Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.

作者信息

Mavromatis Konstantinos, Ivanova Natalia, Barry Kerrie, Shapiro Harris, Goltsman Eugene, McHardy Alice C, Rigoutsos Isidore, Salamov Asaf, Korzeniewski Frank, Land Miriam, Lapidus Alla, Grigoriev Igor, Richardson Paul, Hugenholtz Philip, Kyrpides Nikos C

机构信息

Department of Energy Joint Genome Institute (DOE-JGI), 2800 Mitchell Drive, Walnut Creek, California 94598, USA.

出版信息

Nat Methods. 2007 Jun;4(6):495-500. doi: 10.1038/nmeth1043. Epub 2007 Apr 29.

Abstract

Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

摘要

宏基因组学是一个快速兴起的研究微生物群落的领域。为了评估目前用于处理宏基因组序列的方法,我们通过组合从113个分离基因组中随机选择的测序读数构建了三个不同复杂度的模拟数据集。这些数据集旨在从复杂度和系统发育组成方面模拟真实的宏基因组。我们使用三种常用的基因组组装器(Phrap、Arachne和JAZZ)组装抽样读数,并使用两种流行的基因发现流程(fgenesb和CRITICA/GLIMMER)预测基因。使用一种基于序列相似性(blast命中分布)和两种基于序列组成(PhyloPythia、寡核苷酸频率)的分箱方法预测组装重叠群的系统发育起源。通过与相应的分离基因组进行比较,我们探索了模拟群落结构和方法组合对每个处理步骤保真度的影响。模拟数据集可在线获取,以促进宏基因组分析工具的标准化基准测试。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索