Blaisdell B E, Rudd K E, Matin A, Karlin S
Mathematics Department, Stanford University, CA 94305.
J Mol Biol. 1993 Feb 20;229(4):833-48. doi: 10.1006/jmbi.1993.1090.
New computer and statistical methods were used to determine significant direct and inverted repeats in the Escherichia coli contig sequence collection of aggregate 1.6 x 10(6) base-pairs. Eight groups of mostly new structural repeat identities were uncovered. Apart from the high statistical significance of these repeat sequences, there are suggestive relationships of the group matches in terms of neighboring genes, of genomic distributions, of their texts, and of their potentials for secondary structure. Four of these groups are relatively numerous, 11 to 26 members, one is in coding sequences and three are in non-coding. The coding group consists of the ATP-activated transmembrane component of a typical high-affinity protein-binding transport system. One of the non-coding groups consists of a special rho-independent transcription termination signal closely following an operon. The gene neighbors of this group often appear to be involved in some way in processing RNA or DNA. A second non-coding group has, for one or both neighboring genes, a component of a system responding to stress or starvation for some nutrient.
运用新的计算机和统计方法,在大肠杆菌重叠群序列集合(总计1.6×10⁶个碱基对)中确定显著的正向和反向重复序列。发现了八组主要为新的结构重复序列。除了这些重复序列具有高度统计学意义外,这些组的匹配在相邻基因、基因组分布、序列文本以及二级结构潜力方面存在暗示性的关系。其中四组数量相对较多,有11至26个成员,一组存在于编码序列中,三组存在于非编码序列中。编码序列组由典型的高亲和力蛋白质结合转运系统的ATP激活跨膜成分组成。其中一个非编码序列组由一个特殊的不依赖ρ因子的转录终止信号组成,该信号紧跟在一个操纵子之后。该组的基因邻居似乎常常以某种方式参与RNA或DNA的加工。第二个非编码序列组的一个或两个相邻基因具有一个对压力或某种营养物质饥饿作出反应的系统的成分。