Cheng Yuhe, Nandi Shuvro P, Culibrk Luka, Kristin Audrey, Stuewe Isabella, Al-Azzam Shams, Petljak Mia, Alexandrov Ludmil B
Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA.
Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA.
bioRxiv. 2025 Jul 18:2025.07.13.664565. doi: 10.1101/2025.07.13.664565.
Duplex sequencing enables highly accurate detection of rare somatic mutations, but existing variant callers often rely on protocol-specific heuristics that limit sensitivity, reproducibility, and cross-study comparability. We present DupCaller, a probabilistic variant caller that builds sample-specific error profiles and applies a strand-aware statistical model for mutation detection. Across 50 synthetic datasets, DupCaller identified 1.25-fold more single-base substitutions (SBSs) and 1.41-fold more indels than a state-of-the-art method, while exhibiting equal or better precision. In three duplex-sequenced cell lines treated with aristolochic acid, it recovered expected mutational signatures while detecting 3.5-fold more SBSs and 2.8-fold more indels. In 93 tissue samples-including neurons, cord blood, sperm, saliva, and blood-DupCaller showed consistent gains, detecting 1.21- to 2.7-fold more mutations. Sensitivity scaled with sample duplication rate, yielding approximately 1.5-fold more mutations under optimal conditions and over 3-fold more in low-duplication samples where other tools falter. These results establish DupCaller as a robust and scalable solution for somatic mutation profiling in duplex sequencing across diverse biological and technical contexts.
双链测序能够高精度地检测罕见体细胞突变,但现有的变异检测工具通常依赖于特定协议的启发式方法,这限制了灵敏度、可重复性和跨研究的可比性。我们提出了DupCaller,这是一种概率性变异检测工具,它构建样本特异性错误概况,并应用链感知统计模型进行突变检测。在50个合成数据集上,DupCaller识别出的单碱基替换(SBS)比一种先进方法多1.25倍,插入缺失比其多1.41倍,同时表现出相同或更高的精度。在用马兜铃酸处理的三个双链测序细胞系中,它恢复了预期的突变特征,同时检测到的SBS比其多3.5倍,插入缺失比其多2.8倍。在93个组织样本(包括神经元、脐带血、精子、唾液和血液)中,DupCaller显示出持续的优势,检测到的突变比其多1.21至2.7倍。灵敏度与样本重复率成比例,在最佳条件下产生的突变比其多约1.5倍,在其他工具表现不佳的低重复样本中则多3倍以上。这些结果表明,DupCaller是一种强大且可扩展的解决方案,可用于在各种生物学和技术背景下的双链测序中进行体细胞突变分析。