Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA.
Nature. 2023 May;617(7960):325-334. doi: 10.1038/s41586-023-05895-y. Epub 2023 May 10.
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences.
单核苷酸变异(SNVs)在串联重复(SDs)中尚未得到系统评估,因为短读测序数据映射的局限性。在这里,我们构建了跨越 102 个人类单倍型的高同一性 SD 的 1:1 明确比对,并比较了独特区域和重复区域之间的 SNV 模式。我们发现,与独特区域相比,人类 SNV 在 SD 中升高了 60%,并且估计至少有 23%的增加是由于基因间基因转换(IGC)引起的,每个人类单倍型平均转换了多达 4.3Mb 的 SD 序列。我们开发了一个全基因组 IGC 供体和受体图谱,包括 498 个受体和 454 个影响约 800 个蛋白质编码基因外显子的供体热点。其中包括 171 个平均在人类单倍型子集中转录了 1.61Mb 的“移位”基因。使用合并框架,我们表明与独特序列相比,SD 区域的进化年龄略大,可能是由于 IGC。然而,SD 中的 SNV 显示出独特的突变谱:在所有三联体背景下,转换嘧啶为鸟嘌呤或反之的颠换增加了 27.1%,与独特 DNA 相比,CpG 相关突变的频率降低了 7.6%。我们认为这些独特的突变特性有助于维持 SD DNA 与独特 DNA 相比具有更高的整体 GC 含量,可能是由同源序列之间的 GC 偏向性转换驱动的。