Department of Physics and Computer Science and Xavier University of Louisiana, New Orleans, Louisiana, USA.
Department of Biology, Xavier University of Louisiana, New Orleans, Louisiana, USA.
J Comput Biol. 2021 Feb;28(2):185-194. doi: 10.1089/cmb.2020.0249. Epub 2020 Aug 12.
Complex genomic structural variants (CGSVs) are abnormalities that present with three or more breakpoints, making their discovery a challenge. The majority of existing algorithms for structural variant detection are only designed to find simple structural variants (SSVs) such as deletions and inversions; they fail to find more complex events such as deletion-inversions or deletion-duplications, for example. In this study, we present an algorithm named CleanBreak that employs a clique partitioning graph-based strategy to identify collections of SSV clusters and then subsequently identifies overlapping SSV clusters to examine the search space of possible CGSVs, choosing the one that is most concordant with local read depth. We evaluated CleanBreak's performance on whole genome simulated data and a real data set from the 1000 Genomes Project. We also compared CleanBreak with another algorithm for CGSV discovery. The results demonstrate CleanBreak's utility as an effective method to discover CGSVs.
复杂基因组结构变异(CGSVs)是指具有三个或更多断点的异常,因此发现它们具有挑战性。大多数现有的结构变异检测算法仅设计用于发现简单结构变异(SSVs),如缺失和倒位;它们无法发现更复杂的事件,例如缺失-倒位或缺失-重复等。在这项研究中,我们提出了一种名为 CleanBreak 的算法,该算法采用基于团划分图的策略来识别 SSV 簇的集合,然后识别重叠的 SSV 簇,以检查可能的 CGSVs 的搜索空间,选择与局部读取深度最一致的簇。我们在全基因组模拟数据和 1000 基因组计划的真实数据集上评估了 CleanBreak 的性能。我们还将 CleanBreak 与另一种用于 CGSV 发现的算法进行了比较。结果表明,CleanBreak 是一种发现 CGSVs 的有效方法。