Department of Computing, Imperial College London, London, SW7 2AZ, UK.
Systems Biology PhD Program, Columbia University in New York City, New York, USA.
BMC Bioinformatics. 2020 Feb 5;21(1):45. doi: 10.1186/s12859-020-3367-3.
Current popular variant calling pipelines rely on the mapping coordinates of each input read to a reference genome in order to detect variants. Since reads deriving from variant loci that diverge in sequence substantially from the reference are often assigned incorrect mapping coordinates, variant calling pipelines that rely on mapping coordinates can exhibit reduced sensitivity.
In this work we present GeDi, a suffix array-based somatic single nucleotide variant (SNV) calling algorithm that does not rely on read mapping coordinates to detect SNVs and is therefore capable of reference-free and mapping-free SNV detection. GeDi executes with practical runtime and memory resource requirements, is capable of SNV detection at very low allele frequency (<1%), and detects SNVs with high sensitivity at complex variant loci, dramatically outperforming MuTect, a well-established pipeline.
By designing novel suffix-array based SNV calling methods, we have developed a practical SNV calling software, GeDi, that can characterise SNVs at complex variant loci and at low allele frequency thus increasing the repertoire of detectable SNVs in tumour genomes. We expect GeDi to find use cases in targeted-deep sequencing analysis, and to serve as a replacement and improvement over previous suffix-array based SNV calling methods.
当前流行的变异调用管道依赖于将每个输入读取的映射坐标映射到参考基因组,以检测变异。由于源自与参考序列有很大差异的变异基因座的读取通常被赋予不正确的映射坐标,因此依赖于映射坐标的变异调用管道可能会降低灵敏度。
在这项工作中,我们提出了 GeDi,这是一种基于后缀数组的体细胞单核苷酸变异(SNV)调用算法,它不依赖于读取映射坐标来检测 SNV,因此能够进行无参考和无映射的 SNV 检测。GeDi 的执行具有实际的运行时和内存资源要求,能够在非常低的等位基因频率(<1%)下检测 SNV,并且在复杂的变异基因座上具有很高的 SNV 检测灵敏度,大大优于 MuTect,这是一种成熟的管道。
通过设计新颖的基于后缀数组的 SNV 调用方法,我们开发了一种实用的 SNV 调用软件 GeDi,它可以在复杂的变异基因座和低等位基因频率下描述 SNV,从而增加肿瘤基因组中可检测 SNV 的范围。我们期望 GeDi 在靶向深度测序分析中找到用例,并取代和改进以前基于后缀数组的 SNV 调用方法。