Avery P J
Department of Statistics, University of Newcastle upon Tyne, England.
J Mol Evol. 1987;26(4):335-40. doi: 10.1007/BF02101152.
In order to examine whether certain short DNA sequences (putative splice signals) occurred in a certain region of an intron more often than would be expected by chance, intron data were examined to see what structure they took. There were significant departures from equal nucleotide frequency, and successive nucleotides clearly did not occur independently in the rat and mouse introns examined. The nonindependence was mainly due to a CG shortage and a less marked TA shortage. However the pairwise frequencies explained almost all the variability in triplet frequencies in the data and so the data could be approximately modeled by using nucleotide frequencies conditional on what the previous nucleotide was. Some coding DNA was also examined and the pairs in second and third positions, and third and first positions in a codon, showed similar departures from independence to those of the intron data. Using the probability model derived for intron data, expected frequencies of putative signals were derived and compared with the observed frequencies.
为了检验某些短DNA序列(假定的剪接信号)在一个内含子的特定区域出现的频率是否高于随机预期,对内含子数据进行了检查,以了解它们呈现何种结构。核苷酸频率并不相等,在所检查的大鼠和小鼠内含子中,连续的核苷酸显然并非独立出现。这种非独立性主要是由于CG短缺以及不太明显的TA短缺。然而,成对频率几乎解释了数据中三联体频率的所有变异性,因此可以通过以前一个核苷酸为条件的核苷酸频率来近似模拟数据。还检查了一些编码DNA,密码子中第二和第三位以及第三和第一位的碱基对,显示出与内含子数据类似的偏离独立性的情况。利用从内含子数据推导的概率模型,得出假定信号的预期频率,并与观察频率进行比较。