May Ali, Abeln Sanne, Buijs Mark J, Heringa Jaap, Crielaard Wim, Brandt Bernd W
Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands.
Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands.
Nucleic Acids Res. 2015 Jul 1;43(W1):W301-5. doi: 10.1093/nar/gkv346. Epub 2015 Apr 15.
Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species composition in a multitude of ecological niches. These sequencing runs often contain a sample with known composition that can be used to evaluate the sequencing quality or to detect novel sequence variants. With NGS-eval, the reads from such (mock) samples can be used to (i) explore the differences between the reads and their references and to (ii) estimate the sequencing error rate. This tool maps these reads to references and calculates as well as visualizes the different types of sequencing errors. Clearly, sequencing errors can only be accurately calculated if the reference sequences are correct. However, even with known strains, it is not straightforward to select the correct references from databases. We previously analysed a pyrosequencing dataset from a mock sample to estimate sequencing error rates and detected sequence variants in our mock community, allowing us to obtain an accurate error estimation. Here, we demonstrate the variant detection and error analysis capability of NGS-eval with Illumina MiSeq reads from the same mock community. While tailored towards the field of metagenomics, this server can be used for any type of MGM-based reads. NGS-eval is available at http://www.ibi.vu.nl/programs/ngsevalwww/.
微生物遗传标记(MGM)的大规模平行测序用于揭示众多生态位中的物种组成。这些测序运行通常包含一个已知组成的样本,可用于评估测序质量或检测新的序列变体。使用NGS-eval,来自此类(模拟)样本的读数可用于(i)探索读数与其参考序列之间的差异,以及(ii)估计测序错误率。该工具将这些读数映射到参考序列,并计算和可视化不同类型的测序错误。显然,只有当参考序列正确时,才能准确计算测序错误。然而,即使对于已知菌株,从数据库中选择正确的参考序列也并非易事。我们之前分析了一个来自模拟样本的焦磷酸测序数据集,以估计测序错误率,并在我们的模拟群落中检测到序列变体,从而使我们能够获得准确的错误估计。在这里,我们用来自同一模拟群落的Illumina MiSeq读数展示了NGS-eval的变体检测和错误分析能力。虽然该服务器是针对宏基因组学领域定制的,但可用于任何基于MGM的读数。可在http://www.ibi.vu.nl/programs/ngsevalwww/获取NGS-eval。