Noyes Michelle D, Sui Yang, Kwon Youngjun, Koundinya Nidhi, Wong Isaac, Munson Katherine M, Hoekzema Kendra, Kordosky Jennifer, Garcia Gage H, Knuth Jordan, Lewis Alexandra P, Eichler Evan E
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
bioRxiv. 2025 Jul 19:2025.07.18.665621. doi: 10.1101/2025.07.18.665621.
Long-read sequencing (LRS) has improved sensitivity to discover variation in complex repetitive regions, assign parent-of-origin, and distinguish germline from postzygotic mutations (PZMs). Most studies have been limited to population genetic surveys or a few families. We applied three orthogonal sequencing technologies-lIlumina, Oxford Nanopore Technologies, and Pacific Biosciences-to discover and validate mutations (DNMs) in 73 children from 42 autism families (157 individuals). Assaying 2.77 Gbp of the human genome using read-based approaches, we discover on average 95 DNMs per transmission (87.5 single-nucleotide variants and 7.8 indels), including sex chromosomes. We estimate that LRS increases DNM discovery by 20-40% over previous Illumina-based studies of the same families, and more than doubles the discoverable number of PZMs that emerged early in embryonic development. The strict germline mutation rate is 1.30×10 substitutions per base pair per generation, strongly driven by the father's germline (3.95:1), while PZMs increase the rate by 0.23×10 with a modest but significant bias toward paternal haplotypes (1.15:1). We show that the mutation rate is significantly increased for classes of repetitive DNA, where segmental duplication (SD) mutation shows a dependence on the length and percent identity of the SD. We find that the mutation rate enrichment in repeats occurs predominantly postzygotically as opposed to in the germline, a likely result of faulty DNA repair and interlocus gene conversion.
长读长测序(LRS)提高了在复杂重复区域发现变异、确定亲本来源以及区分种系突变和合子后突变(PZM)的灵敏度。大多数研究仅限于群体遗传调查或少数几个家庭。我们应用了三种正交测序技术——Illumina、牛津纳米孔技术和太平洋生物科学公司的技术——来发现和验证42个自闭症家庭中73名儿童(共157人)的新发突变(DNM)。使用基于 reads 的方法检测了27.7亿碱基对的人类基因组,我们平均每次传递发现95个DNM(87.5个单核苷酸变异和7.8个插入缺失),包括性染色体。我们估计,与之前基于Illumina对同一家族的研究相比,LRS使DNM的发现增加了20%-40%,并且使胚胎发育早期出现的可发现PZM数量增加了一倍多。严格的种系突变率为每代每碱基对1.30×10个替换,主要由父系种系驱动(3.95:1),而PZM使突变率增加0.23×10,对父本单倍型有适度但显著的偏向(1.15:1)。我们表明,重复DNA类别的突变率显著增加,其中节段性重复(SD)突变显示出对SD长度和同一性百分比的依赖性。我们发现,重复序列中的突变率富集主要发生在合子后而非种系中,这可能是DNA修复错误和基因座间基因转换的结果。