Cheng Alexandre Pellan, Rusinek Itai, Sossin Aaron, Widman Adam J, Meiri Eti, Krieger Gat, Hirschberg Ori, Tov Doron Shem, Gilad Shlomit, Jaimovich Ariel, Barad Omer, Avaylon Sammantha, Rajagopalan Srinivas, Potenski Catherine, Prieto Tamara, Yuan Dennis J, Furatero Rob, Runnels Alexi, Costa Benjamin M, Shoag Jonathan E, Assaad Majd Al, Sigouros Michael, Manohar Jyothi, King Abigail, Wilkes David, Otilano John, Malbari Murtaza S, Elemento Olivier, Mosquera Juan Miguel, Altorki Nasser K, Saxena Ashish, Callahan Margaret K, Robine Nicolas, Germer Soren, Evrony Gilad D, Faltas Bishoy M, Landau Dan-Avi
École de Technologie Supérieure, Montréal, Québec, Canada.
Centre de Recherche du Centre Hospitalier de l'Université de Montréal, Montréal, Québec, Canada.
bioRxiv. 2025 Aug 14:2025.08.11.669689. doi: 10.1101/2025.08.11.669689.
Distinguishing real biological variation in the form of single-nucleotide variants (SNVs) from errors is a major challenge for genome sequencing technologies. This is particularly true in settings where SNVs are at low frequency such as cancer detection through liquid biopsy, or human somatic mosaicism. State-of-the-art molecular denoising approaches for DNA sequencing rely on duplex sequencing, where both strands of a single DNA molecule are sequenced to discern true variants from errors arising from single stranded DNA damage. However, such duplex approaches typically require massive over-sequencing to overcome low capture rates of duplex molecules. To address these challenges, we introduce paired plus-minus sequencing (ppmSeq) technology, in which both DNA strands are partitioned and clonally amplified on sequencing beads through emulsion PCR. In this reaction, both strands of a double-stranded DNA molecule contribute to a single sequencing read, allowing for a duplex yield that scales linearly with sequencing coverage across a wide range of inputs (1.8-98 ng). We benchmarked ppmSeq against current duplex sequencing technologies, demonstrating superior duplex recovery with ppmSeq, with a rate of 44%±5.5% (compared to ~5-11% for leading duplex technologies). Using both genomic as well as cell-free DNA, we established error rates for ppmSeq, which had residual SNV detection error rates as low as 7.98x10 for gDNA (using an end-repair protocol with dideoxy nucleotides) and 3.5x10±7.5x10 for cell-free DNA. To test the capabilities of ppmSeq for error-corrected whole-genome sequencing (WGS) for clinical application, we assessed circulating tumor DNA (ctDNA) detection for disease monitoring in cancer patients. We demonstrated that ppmSeq enables powerful tumor-informed ctDNA detection at concentrations of 10 across most cancers, and up to 10 in cancers with high mutation burden. We then leveraged genome-wide trinucleotide mutation patterns characteristic of urothelial (APOBEC3-related and platinum exposure-related signatures) and lung (tobacco-exposure-related signatures) cancers to perform tumor-naive ctDNA detection, showing that ppmSeq can identify a disease-specific signal in plasma cell-free DNA without a matched tumor, and that this signal correlates with imaging-based disease metrics. Altogether, ppmSeq provides an error-corrected, cost-efficient and scalable approach for high-fidelity WGS that can be harnessed for challenging clinical applications and emerging frontiers in human somatic genetics where high accuracy is required for mutation identification.
区分单核苷酸变异(SNV)形式的真实生物学变异与错误是基因组测序技术面临的一项重大挑战。在SNV频率较低的情况下尤其如此,例如通过液体活检进行癌症检测或人类体细胞嵌合现象。用于DNA测序的最先进分子去噪方法依赖于双链测序,即对单个DNA分子的两条链进行测序,以从单链DNA损伤产生的错误中辨别出真正的变异。然而,这种双链方法通常需要大量的过度测序来克服双链分子的低捕获率。为应对这些挑战,我们引入了正负配对测序(ppmSeq)技术,其中两条DNA链在测序珠上通过乳液PCR进行分区和克隆扩增。在这个反应中,双链DNA分子的两条链都对单个测序读数有贡献,从而实现双链产量随广泛输入范围(1.8 - 98 ng)的测序覆盖度呈线性变化。我们将ppmSeq与当前的双链测序技术进行了基准测试,证明ppmSeq具有卓越的双链回收率,回收率为44%±5.5%(领先的双链技术约为5 - 11%)。使用基因组DNA以及游离DNA,我们确定了ppmSeq的错误率,对于基因组DNA(使用含双脱氧核苷酸的末端修复方案),残留SNV检测错误率低至7.98×10⁻⁵,对于游离DNA为3.5×10⁻⁴±7.5×10⁻⁵。为了测试ppmSeq在临床应用中用于错误校正全基因组测序(WGS)的能力,我们评估了癌症患者疾病监测中的循环肿瘤DNA(ctDNA)检测。我们证明,ppmSeq能够在大多数癌症中以10⁻⁶的浓度进行强大的肿瘤知情ctDNA检测,在高突变负荷的癌症中可达10⁻⁷。然后,我们利用尿路上皮癌(与APOBEC3相关和铂暴露相关特征)和肺癌(与烟草暴露相关特征)特有的全基因组三核苷酸突变模式进行无肿瘤ctDNA检测,表明ppmSeq可以在无匹配肿瘤的血浆游离DNA中识别疾病特异性信号,并且该信号与基于成像的疾病指标相关。总之,ppmSeq为高保真WGS提供了一种错误校正、成本效益高且可扩展的方法,可用于具有挑战性的临床应用以及人类体细胞遗传学中需要高精度进行突变鉴定的新兴前沿领域。