Eren A Murat, Morrison Hilary G, Huse Susan M, Sogin Mitchell L
Brief Bioinform. 2014 Sep;15(5):783-7. doi: 10.1093/bib/bbt010. Epub 2013 May 22.
The extremely high error rates reported by Keegan et al. in 'A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE' (PLoS Comput Biol 2012; 8: :e1002541) for many next-generation sequencing datasets prompted us to re-examine their results. Our analysis reveals that the presence of conserved artificial sequences, e.g. Illumina adapters, and other naturally occurring sequence motifs accounts for most of the reported errors. We conclude that DRISEE reports inflated levels of sequencing error, particularly for Illumina data. Tools offered for evaluating large datasets need scrupulous review before they are implemented.
基冈等人在《一种用于检测宏基因组测序数据中错误的平台无关方法:DRISEE》(《公共科学图书馆·计算生物学》2012年;8:e1002541)中报告的许多下一代测序数据集的极高错误率促使我们重新审视他们的结果。我们的分析表明,保守人工序列(如Illumina接头)和其他天然存在的序列基序的存在是报告的大多数错误的原因。我们得出结论,DRISEE报告的测序错误水平过高,尤其是对于Illumina数据。在实施用于评估大型数据集的工具之前,需要进行严格审查。