Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, Japan.
Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo, Japan.
Bioinformatics. 2018 Nov 1;34(21):3631-3637. doi: 10.1093/bioinformatics/bty398.
Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions.
We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments.
An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe.
Supplementary data are available at Bioinformatics online.
分裂比对为基因组重排提供了碱基对分辨率的证据。在实践中,它们是通过首先计算高分的局部比对来找到的,然后将这些局部比对的一部分组合成一个分裂比对。当将短读与大型重复参考进行比对时,这种方法具有挑战性,因为它往往会产生许多虚假的局部比对,从而导致在识别正确的分裂比对时产生歧义。由于重排往往发生在富含重复的区域,因此这个问题更加严重。
我们提出了一种分裂比对技术,通过结合来自概率比对的信息和来自配对末端读取的位置信息来解决对齐模糊的问题。我们证明了我们的方法可以找到准确的分裂比对,并且这转化为依赖于分裂比对的变体调用工具的性能得到了提高。
一个开源实现可在以下网址免费获得:https://bitbucket.org/splitpairedend/last-split-pe。
补充数据可在 Bioinformatics 在线获得。