Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
Genome Res. 2014 Feb;24(2):310-7. doi: 10.1101/gr.162883.113. Epub 2013 Dec 4.
Recent progress in next-generation sequencing has greatly facilitated our study of genomic structural variation. Unlike single nucleotide variants and small indels, many structural variants have not been completely characterized at nucleotide resolution. Deriving the complete sequences underlying such breakpoints is crucial for not only accurate discovery, but also for the functional characterization of altered alleles. However, our current ability to determine such breakpoint sequences is limited because of challenges in aligning and assembling short reads. To address this issue, we developed a targeted iterative graph routing assembler, TIGRA, which implements a set of novel data analysis routines to achieve effective breakpoint assembly from next-generation sequencing data. In our assessment using data from the 1000 Genomes Project, TIGRA was able to accurately assemble the majority of deletion and mobile element insertion breakpoints, with a substantively better success rate and accuracy than other algorithms. TIGRA has been applied in the 1000 Genomes Project and other projects and is freely available for academic use.
近年来,下一代测序技术的发展极大地促进了我们对基因组结构变异的研究。与单核苷酸变异和小的插入缺失不同,许多结构变异在核苷酸分辨率上尚未完全表征。推导这些断点下的完整序列不仅对于准确发现至关重要,而且对于改变等位基因的功能表征也至关重要。然而,由于短读段对齐和组装的挑战,我们目前确定这些断点序列的能力受到限制。为了解决这个问题,我们开发了一种靶向迭代图路由组装器 TIGRA,它实现了一系列新的数据分析例程,从下一代测序数据中实现有效的断点组装。在我们使用来自 1000 基因组计划的数据进行的评估中,TIGRA 能够准确地组装大多数缺失和移动元件插入断点,其成功率和准确性明显优于其他算法。TIGRA 已应用于 1000 基因组计划和其他项目,并可免费供学术使用。