BGI Shenzhen, Shenzhen 518000, China.
Genome Res. 2013 Jan;23(1):195-200. doi: 10.1101/gr.132480.111. Epub 2012 Sep 12.
We present a new approach to indel calling that explicitly exploits that indel differences between a reference and a sequenced sample make the mapping of reads less efficient. We assign all unmapped reads with a mapped partner to their expected genomic positions and then perform extensive de novo assembly on the regions with many unmapped reads to resolve homozygous, heterozygous, and complex indels by exhaustive traversal of the de Bruijn graph. The method is implemented in the software SOAPindel and provides a list of candidate indels with quality scores. We compare SOAPindel to Dindel, Pindel, and GATK on simulated data and find similar or better performance for short indels (<10 bp) and higher sensitivity and specificity for long indels. A validation experiment suggests that SOAPindel has a false-positive rate of ∼10% for long indels (>5 bp), while still providing many more candidate indels than other approaches.
我们提出了一种新的插入缺失(indel)调用方法,该方法明确利用了参考序列和测序样本之间的 indel 差异,从而降低了读取的映射效率。我们将所有具有映射伙伴的未映射读取分配到其预期的基因组位置,然后在具有大量未映射读取的区域上进行广泛的从头组装,通过对 de Bruijn 图进行全面遍历,来解析纯合子、杂合子和复杂的插入缺失。该方法在软件 SOAPindel 中实现,并提供了具有质量得分的候选插入缺失列表。我们在模拟数据上比较了 SOAPindel 与 Dindel、Pindel 和 GATK,发现对于短插入缺失(<10bp),SOAPindel 的性能相似或更好,对于长插入缺失,其灵敏度和特异性更高。验证实验表明,对于长插入缺失(>5bp),SOAPindel 的假阳性率约为 10%,但仍提供了比其他方法更多的候选插入缺失。