Volinia S, Scapoli C, Gambari R, Barale R, Barrai I
Dipartimento di Biologia Evolutiva, Università di Ferrara, Italy.
Nucleic Acids Res. 1991 Jul 11;19(13):3733-40. doi: 10.1093/nar/19.13.3733.
We studied the frequency distribution of oligonucleotides 10 bp long in a sample of 620 Kb of viral genomes, containing 102 sequences from GenBank, with the aim of detecting transcription control signals. Two thousand three hundred decamers had a frequency 10 times higher than the mean and were subjected to further statistical analysis. For each of the 2300 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and then calculated two variance/mean chi squares for the progeny, with and without the parent. We then studied the distribution of the ratio between the two chi squares. Out of 2300 decamers, 10 times more frequent than average, 479 decamers had a chi square ratio of 1.9 or larger. In this final set, which corresponds to less than 0.05% of all possible decamers, 58 decamers were found to contain viral and eukaryotic transcription control elements, like NF-kB, Sp1 and others. Furthermore, this set contains an excess of signals of length 5, 6, 7, 8, 9 and 10, when compared to 150 random sets, bootstrapped from the same viral genomes.
我们研究了病毒基因组620 Kb样本中10个碱基对长的寡核苷酸的频率分布,该样本包含来自GenBank的102个序列,目的是检测转录控制信号。两千三百个十聚体的频率比平均值高10倍,并进行了进一步的统计分析。对于这2300个十聚体(亲本)中的每一个,我们计算了与亲本相差一个碱基突变的30个十聚体(子代)的个体频率,然后计算了有亲本和无亲本时子代的两个方差/均值卡方值。然后我们研究了两个卡方值之间的比率分布。在比平均频率高10倍的2300个十聚体中,479个十聚体的卡方比率为1.9或更大。在这个最终集合中,占所有可能十聚体不到0.05%,发现58个十聚体含有病毒和真核转录控制元件,如NF-kB、Sp1等。此外,与从相同病毒基因组中自展得到的150个随机集合相比,这个集合中长度为5、6、7、8、9和10的信号过多。