School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
Department of Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200433, China.
BMC Bioinformatics. 2020 Apr 3;21(1):127. doi: 10.1186/s12859-020-3412-2.
Hybrid capture-based next-generation sequencing of DNA has been widely applied in the detection of circulating tumor DNA (ctDNA). Various methods have been proposed for ctDNA detection, but low-allelic-fraction (AF) variants are still a great challenge. In addition, no panel-wide calling algorithm is available, which hiders the full usage of ctDNA based 'liquid biopsy'. Thus, we developed the VBCALAVD (Virtual Barcode-based Calling Algorithm for Low Allelic Variant Detection) in silico to overcome these limitations.
Based on the understanding of the nature of ctDNA fragmentation, a novel platform-independent virtual barcode strategy was established to eliminate random sequencing errors by clustering sequencing reads into virtual families. Stereotypical mutant-family-level background artifacts were polished by constructing AF distributions. Three additional robust fine-tuning filters were obtained to eliminate stochastic mutant-family-level noises. The performance of our algorithm was validated using cell-free DNA reference standard samples (cfDNA RSDs) and normal healthy cfDNA samples (cfDNA controls). For the RSDs with AFs of 0.1, 0.2, 0.5, 1 and 5%, the mean F1 scores were 0.43 (0.250.56), 0.77, 0.92, 0.926 (0.861.0) and 0.89 (0.75~1.0), respectively, which indicates that the proposed approach significantly outperforms the published algorithms. Among controls, no false positives were detected. Meanwhile, characteristics of mutant-family-level noise and quantitative determinants of divergence between mutant-family-level noises from controls and RSDs were clearly depicted.
Due to its good performance in the detection of low-AF variants, our algorithm will greatly facilitate the noninvasive panel-wide detection of ctDNA in research and clinical settings. The whole pipeline is available at https://github.com/zhaodalv/VBCALAVD.
基于杂交捕获的下一代测序技术已广泛应用于循环肿瘤 DNA(ctDNA)的检测。已经提出了各种用于 ctDNA 检测的方法,但低等位基因分数(AF)变体仍然是一个巨大的挑战。此外,没有面板范围的调用算法,这阻碍了基于 ctDNA 的“液体活检”的充分使用。因此,我们开发了 VBCALAVD(基于虚拟条码的低等位基因变体检测调用算法)来克服这些限制。
基于对 ctDNA 碎片化本质的理解,建立了一种新的与平台无关的虚拟条码策略,通过将测序reads 聚类到虚拟家族中,消除随机测序错误。通过构建 AF 分布来打磨典型的突变家族级背景伪影。获得了另外三个强大的微调滤波器,以消除随机突变家族级噪声。使用无细胞 DNA 参考标准样本(cfDNA RSD)和正常健康 cfDNA 样本(cfDNA 对照)验证了我们算法的性能。对于 AF 分别为 0.1、0.2、0.5、1 和 5%的 RSD,平均 F1 分数分别为 0.43(0.250.56)、0.77、0.92、0.926(0.861.0)和 0.89(0.75~1.0),这表明该方法显著优于已发表的算法。在对照中,没有检测到假阳性。同时,清楚地描绘了突变家族级噪声的特征以及突变家族级噪声与 RSD 之间差异的定量决定因素。
由于在检测低 AF 变体方面的良好性能,我们的算法将极大地促进研究和临床环境中 ctDNA 的非侵入性面板范围检测。整个流程在 https://github.com/zhaodalv/VBCALAVD 上可用。