Vinogradov A E, Anatskaya O V
Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia, 194064.
Mamm Genome. 2017 Oct;28(9-10):455-464. doi: 10.1007/s00335-017-9713-8. Epub 2017 Aug 23.
The AT-rich DNA is mostly associated with condensed chromatin, whereas the GC-rich sequence is preferably located in the dispersed chromatin. The AT-rich genes are prone to be tissue-specific (silenced in most tissues), while the GC-rich genes tend to be housekeeping (expressed in many tissues). This paper reports another important property of DNA base composition, which can affect repertoire of genes with high AT content. The GC-rich sequence is more liable to mutation. We found that Spearman correlation between human gene GC content and mutation probability is above 0.9. The change of base composition even in synonymous sites affects mutation probability of nonsynonymous sites and thus of encoded proteins. There is a unique type of housekeeping genes, which are especially unsafe when prone to mutation. Natural selection which usually removes deleterious mutations, in the case of these genes only increases the hazard because it can descend to suborganismal (cellular) level. These are cell cycle-related genes. In accordance with the proposed concept, they have low GC content of synonymous sites (despite them being housekeeping). The gene-centred protein interaction enrichment analysis (PIEA) showed the core clusters of genes whose interactants are modularly enriched in genes with AT-rich synonymous codons. This interconnected network is involved in double-strand break repair, DNA integrity checkpoints and chromosome pairing at mitosis. The damage of these genes results in genome and chromosome instability leading to cancer and other 'error catastrophes'. Reducing the nonsynonymous mutations, the usage of AT-rich synonymous codons can decrease probability of cancer by above 20-fold.
富含AT的DNA大多与浓缩染色质相关,而富含GC的序列则更倾向于位于分散的染色质中。富含AT的基因往往具有组织特异性(在大多数组织中沉默),而富含GC的基因则倾向于为管家基因(在许多组织中表达)。本文报道了DNA碱基组成的另一个重要特性,它会影响富含AT的基因库。富含GC的序列更容易发生突变。我们发现人类基因GC含量与突变概率之间的斯皮尔曼相关性高于0.9。即使同义位点的碱基组成发生变化也会影响非同义位点的突变概率,进而影响编码蛋白质的突变概率。有一种独特类型的管家基因,当它们容易发生突变时尤其不安全。通常会去除有害突变的自然选择,在这些基因的情况下只会增加危害,因为它可以下降到亚生物体(细胞)水平。这些就是与细胞周期相关的基因。根据所提出的概念,它们同义位点的GC含量较低(尽管它们是管家基因)。以基因为中心的蛋白质相互作用富集分析(PIEA)显示了基因的核心簇,其相互作用分子在富含AT的同义密码子的基因中呈模块化富集。这个相互连接的网络参与双链断裂修复、DNA完整性检查点以及有丝分裂时的染色体配对。这些基因的损伤会导致基因组和染色体不稳定,进而引发癌症和其他“错误灾难”。减少非同义突变,使用富含AT的同义密码子可使癌症发生概率降低20倍以上。