Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA.
Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA.
Nucleic Acids Res. 2021 Feb 22;49(3):1497-1516. doi: 10.1093/nar/gkaa1269.
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
人类基因组的大约 13%可以折叠成非规范(非 B)DNA 结构(例如 G-四联体、Z-DNA 等),这些结构与重要的细胞过程有关。非 B DNA 也会阻碍复制,增加错误并促进突变,但它对全基因组突变率的贡献仍未得到探索。在这里,我们使用人类和猩猩的分歧以及人类单核苷酸多态性,对非编码、非重复基因组区域内非 B DNA 位点、其±2 kb 侧翼区域以及 1-Megabase 窗口内的核苷酸取代频率进行了全面分析。在单碱基分辨率的功能数据分析中,证明取代频率通常在非 B DNA 处升高,每种非 B DNA 类型都有特定的模式。镜像、直接和倒置重复在间隔区中的取代频率高于在重复臂中的取代频率,而 G-四联体,特别是稳定的 G-四联体,在环区中的取代频率高于在茎区中的取代频率。几种非 B DNA 类型也会影响其侧翼区域的取代频率。最后,在多元回归模型中,非 B DNA 在多碱基尺度上的多样性或分歧的预测中比任何其他预测因子都能解释更多的变异。因此,非 B DNA 在小尺度和大尺度上的取代频率变异中起着重要作用。我们的研究结果强调了非 B DNA 在种系突变中的作用,这对进化和遗传疾病具有重要意义。