Division of Pharmacology and Toxicology, The University of Texas at Austin, Dell Pediatric Research Institute, Austin, Texas, United States of America ; Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America.
PLoS Genet. 2013;9(9):e1003816. doi: 10.1371/journal.pgen.1003816. Epub 2013 Sep 26.
Single base substitutions constitute the most frequent type of human gene mutation and are a leading cause of cancer and inherited disease. These alterations occur non-randomly in DNA, being strongly influenced by the local nucleotide sequence context. However, the molecular mechanisms underlying such sequence context-dependent mutagenesis are not fully understood. Using bioinformatics, computational and molecular modeling analyses, we have determined the frequencies of mutation at G • C bp in the context of all 64 5'-NGNN-3' motifs that contain the mutation at the second position. Twenty-four datasets were employed, comprising >530,000 somatic single base substitutions from 21 cancer genomes, >77,000 germline single-base substitutions causing or associated with human inherited disease and 16.7 million benign germline single-nucleotide variants. In several cancer types, the number of mutated motifs correlated both with the free energies of base stacking and the energies required for abstracting an electron from the target guanines (ionization potentials). Similar correlations were also evident for the pathological missense and nonsense germline mutations, but only when the target guanines were located on the non-transcribed DNA strand. Likewise, pathogenic splicing mutations predominantly affected positions in which a purine was located on the non-transcribed DNA strand. Novel candidate driver mutations and tissue-specific mutational patterns were also identified in the cancer datasets. We conclude that electron transfer reactions within the DNA molecule contribute to sequence context-dependent mutagenesis, involving both somatic driver and passenger mutations in cancer, as well as germline alterations causing or associated with inherited disease.
单碱基替换构成了人类基因突变最常见的类型,也是癌症和遗传疾病的主要原因。这些改变在 DNA 中是非随机发生的,强烈受到局部核苷酸序列环境的影响。然而,导致这种序列依赖性诱变的分子机制尚未完全理解。我们使用生物信息学、计算和分子建模分析,确定了在包含第二位点突变的所有 64 个 5'-NGNN-3' 基序中 G•Cbp 的突变频率。我们使用了 24 个数据集,其中包含来自 21 种癌症基因组的 >530,000 个体细胞单碱基替换、>77,000 个导致或与人类遗传疾病相关的种系单碱基替换以及 1670 万个良性种系单核苷酸变体。在几种癌症类型中,突变基序的数量与碱基堆积的自由能以及从靶鸟嘌呤中提取电子所需的能量(电离势)相关。对于病理性错义突变和无义突变,也存在类似的相关性,但仅当靶鸟嘌呤位于非转录 DNA 链上时才存在。同样,致病性剪接突变主要影响嘌呤位于非转录 DNA 链上的位置。在癌症数据集中还鉴定出了新的候选驱动突变和组织特异性突变模式。我们得出结论,DNA 分子内的电子转移反应导致了序列依赖性诱变,包括癌症中的体细胞驱动和乘客突变,以及导致或与遗传疾病相关的种系改变。