Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8562, Japan.
Bioinformatics. 2011 Nov 15;27(22):3085-92. doi: 10.1093/bioinformatics/btr537. Epub 2011 Oct 5.
Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.
In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.
最近的研究揭示了在各种下游分析中考虑下一代测序 (NGS) 平台生成的读质量分数的重要性。已知基于边际概率的概率比对(例如对齐列和/或间隙概率)比传统的最大得分比对提供更准确的比对。然而,尽管该方法在 SNP/indel 调用器和亚硫酸氢盐映射中很有用,因为在这些分析中准确估计对齐列或间隙很重要,但没有考虑质量分数的概率比对的研究。
在这项研究中,我们提出了一种考虑(一个)序列的质量分数以及常用得分矩阵的概率比对方法。该方法基于后验解码技术,从具有质量分数的对齐概率模型中计算出各种边际概率,并可以任意权衡预测(对齐列和间隙)的灵敏度和阳性预测值 (PPV)。该方法可直接应用于读映射(对齐),以准确检测 SNPs 和 indels。几项计算实验表明,与 SHRiMP2、Stampy、BWA 和 Novoalign 等其他映射算法相比,概率比对可以更准确地估计对齐列和间隙。该研究还表明,我们的方法在 SNP/indel 调用方面具有良好的精度。