Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
Nat Struct Mol Biol. 2023 Apr;30(4):417-424. doi: 10.1038/s41594-023-00936-6. Epub 2023 Mar 13.
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.
由重复序列模体形成的非 B-DNA 结构已知是实验系统中诱变的引发因素。在人类基因组中通过计算分析这种现象需要仔细区分内在的混杂因素,包括重叠和中断的模体以及反复出现的测序错误。在这里,我们表明,考虑到这些因素,会消除所有超出模体边界的重复诱导突变的信号,并消除或大大缩小一些模体内部的突变程度,这与之前的报告相矛盾。不能归因于伪影的突变揭示了几种生物学机制。聚合酶滑动在每一种短串联重复模体中产生频繁的插入缺失,暗示存在滑动链结构。短串联重复内中断校正的单核苷酸变体可能来自易错聚合酶。二级结构形成促进了回文重复内的单核苷酸变体和直接重复内的重复。G-四链体模体导致反复出现的测序错误,而 Z-DNA 中的突变则明显不存在。