Suppr超能文献

未比对DNA序列中常见基序的鉴定:应用于大肠杆菌Lrp调控子

Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon.

作者信息

Fraenkel Y M, Mandel Y, Friedberg D, Margalit H

机构信息

Department of Molecular Genetics, Hebrew University-Hadassah Medical School, Jerusalem, Israel.

出版信息

Comput Appl Biosci. 1995 Aug;11(4):379-87. doi: 10.1093/bioinformatics/11.4.379.

Abstract

We describe a relatively simple method for the identification of common motifs in DNA sequences that are known to share a common function. The input sequences are unaligned and there is no information regarding the position or orientation of the motif. Often such data exists for protein-binding regions, where genetic or molecular information that defines the binding region is available, but the specific recognition site within it is unknown. The method is based on the principle of 'divide and conquer'; we first search for dominant submotifs and then build full-length motifs around them. This method has several useful features: (i) it screens all submotifs so that the results are independent of the sequence order in the data; (ii) it allows the submotifs to contain spacers; (iii) it identifies an existing motif even if the data contains 'noise'; (iv) its running time depends linearly on the total length of the input. The method is demonstrated on two groups of protein-binding sequences: a well-studied group of known CRP-binding sequences, and a relatively newly identified group of genes known to be regulated by Lrp. The Lrp motif that we identify, based on 23 gene sequences, is similar to a previously identified motif based on a smaller data set, and to a consensus sequence of experimentally defined binding sites. Individual Lrp sites are evaluated and compared in regard to their regulation mode.

摘要

我们描述了一种相对简单的方法,用于识别已知具有共同功能的DNA序列中的常见基序。输入的序列未对齐,且没有关于基序位置或方向的信息。通常,对于蛋白质结合区域存在此类数据,其中定义结合区域的遗传或分子信息是可用的,但其中的特定识别位点未知。该方法基于“分而治之”的原则;我们首先搜索主要的子基序,然后围绕它们构建全长基序。该方法具有几个有用的特点:(i)它筛选所有子基序,使得结果与数据中的序列顺序无关;(ii)它允许子基序包含间隔区;(iii)即使数据包含“噪声”,它也能识别现有的基序;(iv)其运行时间与输入的总长度呈线性关系。该方法在两组蛋白质结合序列上进行了验证:一组是经过充分研究的已知CRP结合序列,另一组是相对新鉴定的已知受Lrp调控的基因。我们基于23个基因序列鉴定出的Lrp基序,与之前基于较小数据集鉴定出的基序以及实验确定的结合位点的共有序列相似。对各个Lrp位点的调控模式进行了评估和比较。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验