Schröder Jan, Wirawan Adrianto, Schmidt Bertil, Papenfuss Anthony T
Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Australia.
Department of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia.
BMC Bioinformatics. 2017 Jul 20;18(1):346. doi: 10.1186/s12859-017-1760-3.
A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than one breakpoint or fusion).
We present CLOVE, an algorithm for integrating the results of multiple breakpoint or SV callers and classifying the results as a particular SV. CLOVE is based on a graph data structure that is created from the breakpoint information. The algorithm looks for patterns in the graph that are characteristic of more complex rearrangement types. CLOVE is able to integrate the results of multiple callers, producing a consensus call.
We demonstrate using simulated and real data that re-classified SV calls produced by CLOVE improve on the raw call set of existing SV algorithms, particularly in terms of accuracy. CLOVE is freely available from http://www.github.com/PapenfussLab .
准确理解DNA中的结构变异(SVs)在癌症研究和群体多样性研究中至关重要。已经设计了许多方法来从DNA测序数据中识别SVs。然而,这个问题仍然具有挑战性,因为现有方法存在灵敏度低、精度低和位置准确性差的问题。此外,许多现有工具仅识别断点,因此无法收集相关断点并将其分类为特定类型的SV。由于该领域高通量测序技术的使用迅速增加,迫切需要能够准确分类复杂基因组重排(涉及多个断点或融合)的算法。
我们提出了CLOVE,一种用于整合多个断点或SV调用者的结果并将结果分类为特定SV的算法。CLOVE基于从断点信息创建的图数据结构。该算法在图中寻找更复杂重排类型所特有的模式。CLOVE能够整合多个调用者的结果,产生一个一致的调用。
我们使用模拟数据和真实数据证明,CLOVE产生的重新分类的SV调用在现有SV算法的原始调用集基础上有所改进,特别是在准确性方面。CLOVE可从http://www.github.com/PapenfussLab免费获取。