Korn L J, Queen C L, Wegman M N
Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401-5. doi: 10.1073/pnas.74.10.4401.
We describe a computer program designed to facilitate the analysis of nucleic acid sequences. The program can search several nucleic acid sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions--repetitions and dyad symmetries. The homologies need not be perfect: mismatches and "looping out" of nucleotides are allowed. The program also finds (A+T)- and (G+C)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions. We include two representative applications of the program. All published prokaryotic transcription termination sequences (June 1977) were found to share the following features: (i) a string of at least five T residues, (ii) the sequence CGGGC or a close analog immediately preceding the T cluster, (iii) a region of strong dyad symmetry preceding the Ts and including the CGGGC sequence. A sequence of 221 nucleotides consisting of the Escherichia coli trp promoter, operator, and leader was found to contain two strong dyad symmetries. These homologies both occur at known regulatory sites; no comparable homologies occur in regions without regulatory significance.
我们描述了一个旨在促进核酸序列分析的计算机程序。该程序可以在几个核酸序列中搜索它们共有的寡核苷酸。它可以检查DNA或RNA序列中的两种同源区域——重复序列和二重对称序列。同源性不必完美:允许核苷酸错配和“环出”。该程序还能找到富含(A+T)和(G+C)的区域,定位限制性内切酶识别位点,确定二核苷酸和三核苷酸的分布,并执行各种其他功能。我们列举了该程序的两个代表性应用。所有已发表的原核生物转录终止序列(1977年6月)都具有以下共同特征:(i)一串至少五个T残基;(ii)在T簇之前紧邻CGGGC序列或其紧密类似物;(iii)在T之前且包括CGGGC序列的一个强二重对称区域。一段由大肠杆菌色氨酸启动子、操纵基因和前导序列组成的221个核苷酸的序列被发现含有两个强二重对称序列。这些同源性都出现在已知的调控位点;在没有调控意义的区域没有发现类似的同源性。