Volinia S, Scapoli C, Gambari R, Barale R, Barrai I
Dipartimento di Biologia Evolutiva e Istituto di Chimica Biologica-Università di Ferrara, Italy.
Nucleic Acids Res. 1992 Feb 11;20(3):551-6. doi: 10.1093/nar/20.3.551.
We studied the frequency distribution of oligonucleotides 10 bp long in a sample of 1.6 Mb of mammalian genes, containing 579 sequences from GenBank(R) 55.0, with the aim of detecting transcription control signals. 2216 decamers had a frequency higher than 10 times the mean and were subjected to further statistical analysis. For each of the 2216 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and then calculated two variance/mean chi squares for the progeny, with and without the parent. We then studied the distribution of the ratio between the two chi squares. Out of 2216 decamers, 346 had a chi square ratio of 1.9 or larger. In this final set, which corresponds to less than 0.033 per cent of all possible decamers, 18 were found to contain 23 eukaryotic transcription control elements 5-10 bp of length, such as Sp1 and others. Furthermore, when compared to 210 random sets containing 346 decamers, this set contains a highly significant excess of the longer signals.
我们研究了1.6 Mb哺乳动物基因样本中10个碱基对长的寡核苷酸的频率分布,该样本包含来自GenBank(R) 55.0的579个序列,目的是检测转录控制信号。2216个十聚体的频率高于平均值的10倍,并对其进行进一步的统计分析。对于这2216个十聚体(亲本)中的每一个,我们计算了与亲本相差一个碱基突变的30个十聚体(子代)的个体频率,然后计算了子代在有亲本和无亲本情况下的两个方差/均值卡方值。然后我们研究了两个卡方值之间的比率分布。在2216个十聚体中,346个的卡方比率为1.9或更大。在这个最终集合中,其占所有可能十聚体的比例不到0.033%,发现有18个包含23个长度为5 - 10个碱基对的真核转录控制元件,如Sp1等。此外,与包含346个十聚体的210个随机集合相比,这个集合中较长信号的数量显著过多。