Bag Sumit K, Paul Sandip, Ghosh Subhagata, Dutta Chitra
Bioinformatics Centre, Indian Institute of Chemical Biology, Kolkata, India.
DNA Res. 2007 Aug 31;14(4):141-54. doi: 10.1093/dnares/dsm015. Epub 2007 Sep 25.
Genome-wide analysis of sequence divergence patterns in 12,024 human-mouse orthologous pairs reveals, for the first time, that the trends in nucleotide and amino acid substitutions in orthologs of high and low GC composition are highly asymmetric and polarized to opposite directions. The entire dataset has been divided into three groups on the basis of the GC content at third codon sites of human genes: high, medium, and low. High-GC orthologs exhibit significant bias in favor of the replacements, Thr --> Ala, Ser --> Ala, Val --> Ala, Lys --> Arg, Asn --> Ser, Ile --> Val etc., from mouse to human, whereas in low-GC orthologs, the reverse trends prevail. In general, in the high-GC group, residues encoded by A/U-rich codons of mouse proteins tend to be replaced by the residues encoded by relatively G/C-rich codons in their human orthologs, whereas the opposite trend is observed among the low-GC orthologous pairs. The medium-GC group shares some trends with high-GC group and some with low-GC group. The only significant trend common in all groups of orthologs, irrespective of their GC bias, is (Asp)(Mouse) --> (Glu)(Human) replacement. At the nucleotide level, high-GC orthologs have undergone a large excess of (A/T)(Mouse) --> (G/C)(Human) substitutions over (G/C)(Mouse) --> (A/T)(Human) at each codon position, whereas for low-GC orthologs, the reverse is true.
对12024个人类-小鼠直系同源基因对的全基因组序列差异模式分析首次揭示,高GC含量和低GC含量直系同源基因的核苷酸和氨基酸替换趋势高度不对称且方向相反。根据人类基因第三密码子位点的GC含量,整个数据集被分为三组:高、中、低。高GC含量的直系同源基因在从小鼠到人类的替换中表现出显著偏向,如Thr→Ala、Ser→Ala、Val→Ala、Lys→Arg、Asn→Ser、Ile→Val等,而在低GC含量的直系同源基因中,情况则相反。一般来说,在高GC含量组中,小鼠蛋白质中由富含A/U密码子编码的残基往往被其人类直系同源基因中由相对富含G/C密码子编码的残基所取代,而在低GC含量的直系同源基因对中则观察到相反的趋势。中等GC含量组与高GC含量组有一些共同趋势,与低GC含量组也有一些共同趋势。所有直系同源基因组中唯一显著的共同趋势,无论其GC偏向如何,是(Asp)(小鼠)→(Glu)(人类)替换。在核苷酸水平上,高GC含量的直系同源基因在每个密码子位置上(A/T)(小鼠)→(G/C)(人类)的替换量大大超过(G/C)(小鼠)→(A/T)(人类),而对于低GC含量的直系同源基因,情况则相反。