Collado-Vides J
Centro de Investigación sobre Fijación de Nitrógeno, Universidad Nacional Autónoma de México, Cuernavaca, Morelos.
Biosystems. 1993;29(2-3):87-104. doi: 10.1016/0303-2647(93)90086-r.
The inadequacy of context-free grammars in the description of regulatory information contained in DNA gave the formal justification for a linguistic approach to the study of gene regulation. Based on that result, we have initiated a linguistic formalization of the regulatory arrays of 107 sigma 70 E. coli promoters. The complete sequences of promoter (Pr), operator (Op) and activator binding sites (I) have previously been identified as the smallest elements, or categories, for a combinatorial analysis of the range of transcription initiation of sigma 70 promoters. These categories are conceptually equivalent to phonemes of natural language. Several features associated with these categories are required in a complete description of regulatory arrays of promoters. We have to select the best way to describe the properties that are pertinent for the description of such regulatory regions. In this paper we define distinctive features of regulatory regions based on the following criteria: identification of subclasses of substitutable elements, simplicity, selection of the most directly related information, and distinction of one array among the whole set of promoters. Alternative ways to represent distances in between regulatory sites are discussed, permitting, together with a principle of precedence, the identification of an ordered set of complex symbols as a unique representation for a promoter and its associated regulatory sites. In the accompanying paper additional distinctive features of promoters and regulatory sites are identified.
上下文无关语法在描述DNA中包含的调控信息时存在不足,这为采用语言学方法研究基因调控提供了形式上的依据。基于这一结果,我们开始对107个大肠杆菌σ70启动子的调控阵列进行语言学形式化。启动子(Pr)、操纵子(Op)和激活剂结合位点(I)的完整序列先前已被确定为对σ70启动子转录起始范围进行组合分析的最小元素或类别。这些类别在概念上等同于自然语言的音素。在对启动子调控阵列的完整描述中,需要一些与这些类别相关的特征。我们必须选择最佳方式来描述与这类调控区域描述相关的属性。在本文中,我们基于以下标准定义调控区域的独特特征:可替代元素子类别的识别、简单性、最直接相关信息的选择以及在整个启动子集合中区分一个阵列。讨论了表示调控位点之间距离的替代方法,连同优先级原则一起,允许将一组有序的复杂符号识别为启动子及其相关调控位点的唯一表示。在随附的论文中,确定了启动子和调控位点的其他独特特征。