Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.
Bioinformatics. 2013 Jul 15;29(14):1734-41. doi: 10.1093/bioinformatics/btt277. Epub 2013 May 15.
Simple tandem repeats are highly variable genetic elements and widespread in genomes of many organisms. Next-generation sequencing technologies have enabled a robust comparison of large numbers of simple tandem repeat loci; however, analysis of their variation using traditional sequence analysis approaches still remains limiting and problematic due to variants occurring in repeat sequences confusing alignment programs into mapping sequence reads to incorrect loci when the sequence reads are significantly different from the reference sequence.
We have developed a program, ReviSTER, which is an automated pipeline using a 'local mapping reference reconstruction method' to revise mismapped or partially misaligned reads at simple tandem repeat loci. RevisSTER estimates alleles of repeat loci using a local alignment method and creates temporary local mapping reference sequences, and finally remaps reads to the local mapping references. Using this approach, ReviSTER was able to successfully revise reads misaligned to repeat loci from both simulated data and real data.
ReviSTER is open-source software available at http://revister.sourceforge.net.
Supplementary data are available at Bioinformatics online.
简单串联重复是高度可变的遗传因子,广泛存在于许多生物的基因组中。新一代测序技术使对大量简单串联重复基因座进行强有力的比较成为可能;然而,由于重复序列中的变体使比对程序在读取序列与参考序列有显著差异时,将读取序列错误地映射到不正确的基因座,因此使用传统的序列分析方法来分析它们的变异仍然受到限制和存在问题。
我们开发了一个名为 ReviSTER 的程序,它是一个自动化流水线,使用“局部映射参考重建方法”来修正简单串联重复基因座中错配或部分不对齐的读取。ReviSTER 使用局部比对方法估计重复基因座的等位基因,并创建临时局部映射参考序列,最后将读取重新映射到局部映射参考上。通过这种方法,ReviSTER 能够成功地修正来自模拟数据和真实数据的重复基因座错位的读取。
ReviSTER 是一个开源软件,可在 http://revister.sourceforge.net 上获得。
补充数据可在生物信息学在线获得。