Genetron Health (Beijing) Co. Ltd, Beijing 102208, China.
State Key Laboratory of Medical Molecular Biology, Center for Bioinformatics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab458.
The application of next-generation sequencing in research and particularly in clinical routine requires highly accurate variant calling. Here we describe UVC, a method for calling small variants of germline or somatic origin. By unifying opposite assumptions with sublation, we discovered the following two empirical laws to improve variant calling: allele fraction at high sequencing depth is inversely proportional to the cubic root of variant-calling error rate, and odds ratios adjusted with Bayes factors can model various sequencing biases. UVC outperformed other variant callers on the GIAB germline truth sets, 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities, the GIAB somatic truth sets derived from physical mixture, and the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. UVC outperformed other unique molecular identifier (UMI)-aware variant callers on the datasets used for publishing these variant callers. Performance was measured with sensitivity-specificity trade off for called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data provided additional insight about DNA damage repair. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.
下一代测序在研究中,特别是在临床常规中的应用需要高度准确的变异调用。在这里,我们描述了 UVC,这是一种用于调用种系或体细胞来源的小变体的方法。通过将对立的假设统一起来,并加以扬弃,我们发现了以下两条提高变异调用准确性的经验法则:高测序深度下的等位基因分数与变异调用错误率的立方根成反比,并且经过贝叶斯因子调整的优势比可以模拟各种测序偏差。在 GIAB 种系真实数据集、模拟肿瘤/正常测序深度和肿瘤/正常纯度 192 种组合的 192 种计算机模拟混合物场景、来自物理混合物的 GIAB 体细胞真实数据集以及来自乳腺癌细胞系 HCC1395 的 SEQC2 体细胞参考集中,UVC 的表现优于其他变异调用者。UVC 与多位独立研究人员对来自 16 名结肠腺瘤患者的 Qiagen 71 基因面板数据集进行的手动审查结果完全一致。在用于发布这些变异调用者的数据集上,UVC 优于其他独特分子标识符 (UMI) 感知的变异调用者。性能是通过对调用变异体的敏感性特异性权衡来衡量的。UVC 从先前发表的基于 UMI 的测序数据中生成的改进变异调用提供了有关 DNA 损伤修复的更多见解。UVC 是在 BSD 3 条款许可证下开源的,网址为 https://github.com/genetronhealth/uvc 和 quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694。