Rosenblueth D A, Thieffry D, Huerta A M, Salgado H, Collado-Vides J
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, México D.F.
Comput Appl Biosci. 1996 Oct;12(5):415-22. doi: 10.1093/bioinformatics/12.5.415.
One of the most common methodologies to identify cis-regulatory sites in regulatory regions in the DNA is that of weight matrices, as testified by several articles in this issue. An alternative to strengthen the computational predictions in regulatory regions is to develop methods that incorporate more biological properties present in such DNA regions. The grammatical implementation presented in this paper provides a concrete example in this direction.
On the basis of the analysis of an exhaustive collection of regulatory regions in Escherichia coli, a grammatical model for the regulatory regions of sigma 70 promoters has been developed. The terminal symbols of the grammar represent individual sites for the binding of activator and repressor proteins, and include the precise position of sites in relation to transcription initiation. Combining these symbols, the grammar generates a large number of different sentences, each of which can be searched for matching against a collection of regulatory regions by means of weight matrices specific for each set of sites for individual proteins. On the basis of this grammatical model, a Prolog syntactic recognizer is presented here. Specific subgrammars for ArgR, LexA and TyrR were implemented. When parsing a collection of 128 sigma 70 promoter regions, the syntactic recognizer produces a much lower number of false-positive sites than the standard search using weight matrices.
如本期的几篇文章所证明的,识别DNA调控区域中顺式调控位点最常用的方法之一是权重矩阵法。加强调控区域计算预测的另一种方法是开发能纳入此类DNA区域中更多生物学特性的方法。本文介绍的语法实现为此提供了一个具体示例。
基于对大肠杆菌中详尽的调控区域集合的分析,开发了一种针对σ70启动子调控区域的语法模型。该语法的终结符号代表激活蛋白和阻遏蛋白结合的单个位点,包括这些位点相对于转录起始的精确位置。通过组合这些符号,该语法生成大量不同的句子,每个句子都可以通过针对每种蛋白质的每组位点的特定权重矩阵,搜索与调控区域集合进行匹配。基于此语法模型,本文提出了一个Prolog句法识别器。实现了针对ArgR、LexA和TyrR的特定子语法。当解析128个σ70启动子区域的集合时,该句法识别器产生的假阳性位点数量比使用权重矩阵的标准搜索要少得多。