May L T, Landsberger F R, Inouye M, Sehgal P B
Proc Natl Acad Sci U S A. 1985 Jun;82(12):4090-4. doi: 10.1073/pnas.82.12.4090.
The nucleotide sequence of a 14-kilobase (kb) region of the human beta interferon (IFN-beta)-related DNA locus on chromosome 2 (genomic DNA clone lambda B3) was determined and compared to that of the IFN-beta 1 gene by using the Sellers TT algorithm. This algorithm aligns segments of one sequence with similar segments in a second sequence. A strategy was developed for assessing the significance of similarities between DNA sequences based on a scheme that recognizes patterns or runs of identities within an alignment. The pattern score (II) thus obtained is an entropy-like measure. Numerically it is a reflection of the length of the second longest run of identity in an alignment plus a correction factor due to the other shorter identity runs in the alignment. When the IFN-beta 1 gene is compared to a random nucleotide sequence, the distribution of II scores in such comparisons fits a Gaussian function. This strategy has been used to identify seven segments along one strand of lambda B3 DNA that are related to segments in IFN-beta 1; these seven alignments have II scores greater than or equal to 3 standard deviations above the mean score obtained in comparisons between IFN-beta 1 and random nucleotide sequences. One of these alignments (section 7) has a II score 8.02 standard deviations above this mean score. The likelihood of finding an alignment statement as good as that in section 7 in a random sequence the length of the human genome is approximately 10(-7). Furthermore, the lambda B3 DNA sequence in section 7 selects the human IFN-beta 1 gene as the most significant alignment in computer searches of mammalian nucleotide sequence data bases.
测定了位于2号染色体上的人β干扰素(IFN-β)相关DNA位点的一个14千碱基(kb)区域(基因组DNA克隆λB3)的核苷酸序列,并使用塞勒斯TT算法将其与IFN-β1基因的序列进行比较。该算法将一个序列的片段与第二个序列中的相似片段进行比对。基于识别比对中同一性模式或连续片段的方案,开发了一种评估DNA序列之间相似性显著性的策略。由此获得的模式得分(II)是一种类似熵的度量。从数值上讲,它反映了比对中第二长同一性连续片段的长度,再加上由于比对中其他较短同一性连续片段产生的校正因子。当将IFN-β1基因与随机核苷酸序列进行比较时,此类比较中II得分的分布符合高斯函数。该策略已用于识别λB3 DNA一条链上与IFN-β1中的片段相关的7个片段;这7个比对的II得分高于IFN-β1与随机核苷酸序列比较中获得的平均得分3个标准差或更高。其中一个比对(第7部分)的II得分比该平均得分高出8.02个标准差。在人类基因组长度的随机序列中找到与第7部分一样好的比对结果的可能性约为10^(-7)。此外,第7部分中的λB3 DNA序列在哺乳动物核苷酸序列数据库的计算机搜索中,将人类IFN-β1基因选为最显著的比对结果。