Research Group Bioinformatics (NG4), Robert Koch-Institut, Nordufer 20, 13353 Berlin, Germany.
Bioinformatics. 2014 Jan 1;30(1):9-16. doi: 10.1093/bioinformatics/btt255. Epub 2013 May 17.
Accurate estimation, comparison and evaluation of read mapping error rates is a crucial step in the processing of next-generation sequencing data, as further analysis steps and interpretation assume the correctness of the mapping results. Current approaches are either focused on sensitivity estimation and thereby disregard specificity or are based on read simulations. Although continuously improving, read simulations are still prone to introduce a bias into the mapping error quantitation and cannot capture all characteristics of an individual dataset.
We introduce ARDEN (artificial reference driven estimation of false positives in next-generation sequencing data), a novel benchmark method that estimates error rates of read mappers based on real experimental reads, using an additionally generated artificial reference genome. It allows a dataset-specific computation of error rates and the construction of a receiver operating characteristic curve. Thereby, it can be used for optimization of parameters for read mappers, selection of read mappers for a specific problem or for filtering alignments based on quality estimation. The use of ARDEN is demonstrated in a general read mapper comparison, a parameter optimization for one read mapper and an application example in single-nucleotide polymorphism discovery with a significant reduction in the number of false positive identifications.
The ARDEN source code is freely available at http://sourceforge.net/projects/arden/.
准确估计、比较和评估读映射错误率是下一代测序数据处理的关键步骤,因为进一步的分析步骤和解释都假设映射结果的正确性。当前的方法要么专注于灵敏度估计,从而忽略特异性,要么基于读模拟。尽管不断改进,但读模拟仍然容易引入映射错误定量的偏差,并且无法捕获单个数据集的所有特征。
我们引入了 ARDEN(基于人工参考的下一代测序数据中假阳性的估计),这是一种新颖的基准方法,它使用额外生成的人工参考基因组,基于真实的实验读来估计读映射器的错误率。它允许针对特定数据集计算错误率,并构建接收者操作特征曲线。因此,它可用于优化读映射器的参数、为特定问题选择读映射器,或基于质量估计过滤对齐。ARDEN 的使用在一般读映射器比较、一个读映射器的参数优化以及单核苷酸多态性发现中的应用实例中得到了证明,显著减少了假阳性识别的数量。
ARDEN 的源代码可在 http://sourceforge.net/projects/arden/ 上免费获得。