Department of Biology, Barnard College, Columbia University, New York, NY 10027, USA.
G3 (Bethesda). 2022 Jul 29;12(8). doi: 10.1093/g3journal/jkac150.
Substitutions between closely related noncoding chloroplast DNA sequences are studied with respect to the composition of the 3 bases on each side of the substitution, that is the hexanucleotide context. There is about 100-fold variation in rate, among the contexts, particularly on substitutions of A and T. Rate heterogeneity of transitions differs from that of transversions, resulting in a more than 200-fold variation in the transitions: transversion bias. The data are consistent with a CpG effect, and it is shown that both the A + T content and the arrangement of purines/pyrimidines along the same DNA strand are correlated with rate variation. Expected equilibrium A + T content ranges from 36.4% to 82.8% across contexts, while G-C skew ranges from -77.4 to 72.2 and A-T skew ranges from -63.9 to 68.2. The predicted equilibria are associated with specific features of the content of the hexanucleotide context, and also show close agreement with the observed context-dependent compositions. Finally, by controlling for the content of nucleotides closer to the substitution site, it is shown that both the third and fourth nucleotide removed on each side of the substitution directly influence substitution dynamics at that site. Overall, the results demonstrate that noncoding sites in different contexts are evolving along very different evolutionary trajectories and that substitution dynamics are far more complex than typically assumed. This has important implications for a number of types of sequence analysis, particularly analyses of natural selection, and the context-dependent substitution matrices developed here can be applied in future analyses.
我们研究了密切相关的非编码叶绿体 DNA 序列之间的替代情况,特别是在替代的每一侧的 3 个碱基(即六核苷酸序列)组成方面。在这些序列中,替代率的变化幅度约为 100 倍,尤其是在 A 和 T 的替代中。转换的异质性与颠换不同,导致转换的变化幅度超过 200 倍:转换偏倚。这些数据与 CpG 效应一致,并且表明 A+T 含量以及嘌呤/嘧啶在同一 DNA 链上的排列都与替代率的变化有关。预期平衡的 A+T 含量在不同的六核苷酸序列中范围从 36.4%到 82.8%,而 G-C 倾斜度范围从-77.4 到 72.2,A-T 倾斜度范围从-63.9 到 68.2。预测的平衡与六核苷酸序列的特定特征有关,并且与观察到的依赖于上下文的组成也非常吻合。最后,通过控制替代点附近核苷酸的含量,表明在替代点两侧移除的第三和第四个核苷酸直接影响该位点的替代动力学。总的来说,结果表明不同上下文的非编码位点沿着非常不同的进化轨迹进化,替代动力学远比通常假设的复杂。这对许多类型的序列分析都有重要影响,特别是对自然选择的分析,并且这里开发的依赖于上下文的替代矩阵可以应用于未来的分析。