Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
Genomics. 2012 Nov;100(5):271-6. doi: 10.1016/j.ygeno.2012.07.015. Epub 2012 Aug 10.
Sequencing data analysis remains limiting and problematic, especially for low complexity repeat sequences and transposon elements due to inherent sequencing errors and short sequence read lengths. We have developed a program, ReviSeq, which uses a hybrid method composed of iterative remapping and local assembly upon a bacterial sequence backbone. Application of this method to six Brucella suis field isolates compared to the newly revised B. suis 1330 reference genome identified on average 13, 15, 19 and 9 more variants per sample than STAMPY/SAMtools, BWA/SAMtools, iCORN and BWA/PINDEL pipelines, and excluded on average 4, 2, 3 and 19 variants per sample, respectively. In total, using this iterative approach, we identified on average 87 variants including SNVs, short INDELs and long INDELs per strain when compared to the reference. Our program outperforms other methods especially for long INDEL calling. The program is available at http://reviseq.sourceforge.net.
测序数据分析仍然具有局限性和问题,特别是对于低复杂度重复序列和转座子元件,这是由于固有的测序错误和短序列读取长度。我们开发了一个名为 ReviSeq 的程序,该程序使用一种混合方法,由细菌序列骨干上的迭代重映射和局部组装组成。将该方法应用于六个布鲁氏菌 suis 田间分离株与新修订的布鲁氏菌 suis 1330 参考基因组相比,每个样本的平均变体数量比 STAMPY/SAMtools、BWA/SAMtools、iCORN 和 BWA/PINDEL 管道分别多 13、15、19 和 9 个,而排除的变体数量分别平均为 4、2、3 和 19 个。总的来说,与参考基因组相比,我们使用这种迭代方法,每个菌株平均可以识别 87 个变体,包括 SNVs、短 INDEL 和长 INDEL。我们的程序在长 INDEL 调用方面优于其他方法。该程序可在 http://reviseq.sourceforge.net 获得。