School of Computing Science, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada.
Department of Urologic Sciences, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia V52 1M9, Canada.
Nucleic Acids Res. 2019 Apr 23;47(7):e38. doi: 10.1093/nar/gkz067.
Cancer is a complex disease that involves rapidly evolving cells, often forming multiple distinct clones. In order to effectively understand progression of a patient-specific tumor, one needs to comprehensively sample tumor DNA at multiple time points, ideally obtained through inexpensive and minimally invasive techniques. Current sequencing technologies make the 'liquid biopsy' possible, which involves sampling a patient's blood or urine and sequencing the circulating cell free DNA (cfDNA). A certain percentage of this DNA originates from the tumor, known as circulating tumor DNA (ctDNA). The ratio of ctDNA may be extremely low in the sample, and the ctDNA may originate from multiple tumors or clones. These factors present unique challenges for applying existing tools and workflows to the analysis of ctDNA, especially in the detection of structural variations which rely on sufficient read coverage to be detectable.
Here we introduce SViCT , a structural variation (SV) detection tool designed to handle the challenges associated with cfDNA analysis. SViCT can detect breakpoints and sequences of various structural variations including deletions, insertions, inversions, duplications and translocations. SViCT extracts discordant read pairs, one-end anchors and soft-clipped/split reads, assembles them into contigs, and re-maps contig intervals to a reference genome using an efficient k-mer indexing approach. The intervals are then joined using a combination of graph and greedy algorithms to identify specific structural variant signatures. We assessed the performance of SViCT and compared it to state-of-the-art tools using simulated cfDNA datasets with properties matching those of real cfDNA samples. The positive predictive value and sensitivity of our tool was superior to all the tested tools and reasonable performance was maintained down to the lowest dilution of 0.01% tumor DNA in simulated datasets. Additionally, SViCT was able to detect all known SVs in two real cfDNA reference datasets (at 0.6-5% ctDNA) and predict a novel structural variant in a prostate cancer cohort.
SViCT is available at https://github.com/vpc-ccg/svict. Contact:faraz.hach@ubc.ca.
癌症是一种复杂的疾病,涉及快速进化的细胞,通常形成多个不同的克隆。为了有效地了解特定患者肿瘤的进展,需要在多个时间点全面采样肿瘤 DNA,理想情况下通过廉价且微创的技术进行采样。当前的测序技术使得“液体活检”成为可能,其涉及采集患者的血液或尿液并对循环无细胞游离 DNA (cfDNA)进行测序。该 DNA 的一定比例来源于肿瘤,称为循环肿瘤 DNA (ctDNA)。样本中 ctDNA 的比例可能极低,并且 ctDNA 可能源自多个肿瘤或克隆。这些因素给应用现有工具和工作流程分析 ctDNA 带来了独特的挑战,特别是在检测结构变异方面,结构变异依赖于足够的读取覆盖度才能被检测到。
在这里,我们介绍了 SViCT,这是一种专门设计用于处理 cfDNA 分析相关挑战的结构变异 (SV) 检测工具。SViCT 可以检测各种结构变异的断点和序列,包括缺失、插入、倒位、重复和易位。SViCT 提取不一致的读对、一端锚定和软剪辑/分裂读,将它们组装成 contigs,并使用有效的 k-mer 索引方法将 contig 间隔重新映射到参考基因组。然后使用图和贪婪算法的组合来连接间隔,以识别特定的结构变异特征。我们评估了 SViCT 的性能,并使用与真实 cfDNA 样本特性相匹配的模拟 cfDNA 数据集与最先进的工具进行了比较。我们工具的阳性预测值和灵敏度优于所有测试工具,并且在模拟数据集的最低肿瘤 DNA 稀释度为 0.01%时仍保持合理的性能。此外,SViCT 能够在两个真实 cfDNA 参考数据集(ctDNA 为 0.6-5%)中检测到所有已知的 SV,并在前列腺癌队列中预测到一种新的结构变异。
SViCT 可在 https://github.com/vpc-ccg/svict 上获得。联系邮箱:faraz.hach@ubc.ca。