Department of Computer Engineering, Bilkent University, Ankara, Turkey.
Department of Biochemistry and Molecular Medicine, MIND Institute and UC-Davis Genome Center, University of California, Davis, CA, United States.
Methods. 2017 Oct 1;129:3-7. doi: 10.1016/j.ymeth.2017.05.030. Epub 2017 Jun 2.
Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.
结构变异(SV)被广泛定义为影响超过 50bp DNA 的基因组改变,这些改变被证明对进化和疾病有重大影响。高通量测序(HTS)技术的出现以及进行全基因组测序(WGS)的能力,使得深入研究这些变体成为可能。然而,使用 WGS 发现所有形式的 SV 已被证明具有挑战性,因为主要 HTS 平台产生的短读长(<200bp 是目前的技术水平),以及大多数基因组包含大量重复序列,这使得明确映射和准确表征此类变体非常困难。此外,用于 SV 发现的现有工具主要是为少数几种 SV 类型开发的,这些类型可能与其他非靶向 SV 类别的序列特征(即读对、读深、分裂读)相冲突。在这里,我们引入了一个新的框架 Tardis,它将多种读长特征组合到一个单一的包中,以便同时对大多数 SV 类型进行特征描述,同时防止这种冲突。Tardis 还具有模块化结构,便于为发现其他形式的 SV 进行扩展。