Forsdyke D R
Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L 3N6.
Bioinformatics. 2002 Jan;18(1):215-7. doi: 10.1093/bioinformatics/18.1.215.
The relative quantities of bases in DNA were determined chemically many years before sequencing technologies permitted direct counting of bases. Apparently unaware of the rich literature on the topic, bioinformaticists are today rediscovering the 'wheels' of Chargaff, Wyatt and other biochemists. It follows from Chargaff's second parity rule (%A = %T, %G = %C for single stranded DNA) that the symmetries observed for the two pairs of complementary mononucleotide bases, should also apply to the eight pairs of complementary dinucleotide bases, the thirty-two pairs of complementary trinucleotide bases, etc. This was made explicit by Prabhu in 1993 in a study of complete genomes and long genome segments from a wide range of taxa, and was rediscovered by Qi and Cuticchia in 2001 in a study of complete genomes. It follows from Chargaff's GC-rule (%GC tends to be uniform and species specific) that, within a species, oligonucleotides of the same GC% will be at approximately equal quantities in single stranded DNA. Thus, for example, while quantities of CAT and ATG (reverse complements) will be closely correlated because of both of the above Chargaff rules, CAT and GTA (forward complements) will show some correlation only because of the latter rule. The need for complete genomic sequences in bioinformatic analyses may have been somewhat overplayed.
在测序技术能够直接对碱基进行计数的许多年前,人们就已经通过化学方法确定了DNA中碱基的相对数量。显然,生物信息学家们没有意识到关于这个主题的丰富文献,如今他们正在重新发现查加夫、怀亚特以及其他生物化学家的“成果”。根据查加夫的第二条互补规则(单链DNA中%A = %T,%G = %C),两对互补单核苷酸碱基所观察到的对称性,也应该适用于八对互补二核苷酸碱基、三十二对互补三核苷酸碱基等等。1993年,普拉布在对来自广泛分类群的完整基因组和长基因组片段进行研究时明确了这一点,2001年,齐和库蒂奇亚在对完整基因组的研究中也重新发现了这一点。根据查加夫的GC规则(%GC往往是一致的且具有物种特异性),在一个物种内,具有相同GC%的寡核苷酸在单链DNA中的数量将大致相等。因此,例如,由于上述两条查加夫规则,CAT和ATG(反向互补序列)的数量将密切相关,而CAT和GTA(正向互补序列)仅因为后一条规则才会显示出一定的相关性。在生物信息学分析中对完整基因组序列的需求可能有点被夸大了。