Parker Stephen C J, Harlap Aaron, Tullius Thomas D
Bioinformatics Program, Boston University, Boston, MA, USA.
Methods Mol Biol. 2011;759:367-79. doi: 10.1007/978-1-61779-173-4_21.
The rapidly increasing availability of DNA sequence data from modern high-throughput experimental techniques has created the need for computational algorithms to aid in motif discovery in genomic DNA. Such algorithms are typically used to find a statistical representation of the nucleotide sequence of the target site of a DNA-binding protein within a collection of DNA sequences that are thought to contain segments to which the protein is bound. A major assumption of these algorithms is that the protein recognizes the primary order of nucleotides in the sequence. However, proteins can also recognize the three-dimensional shape and structure of DNA. To account for this, we developed a computational method to predict the local structural profiles of any set of DNA sequences and then to search within these profiles for common DNA structural motifs. Here we describe the details of this method and use it to find a DNA structural motif in the Saccharomyces cerevisiae yeast genome that is associated with binding of the transcription factor RLM1, a component of the protein kinase C-mediated MAP kinase pathway.
现代高通量实验技术使得DNA序列数据迅速增加,这就需要计算算法来帮助在基因组DNA中发现基序。此类算法通常用于在一组被认为包含蛋白质结合片段的DNA序列中,找到DNA结合蛋白靶位点核苷酸序列的统计表示。这些算法的一个主要假设是蛋白质识别序列中核苷酸的一级顺序。然而,蛋白质也能够识别DNA的三维形状和结构。为了考虑到这一点,我们开发了一种计算方法,用于预测任何一组DNA序列的局部结构特征,然后在这些特征中搜索常见的DNA结构基序。在此,我们描述该方法的细节,并使用它在酿酒酵母基因组中找到一个与转录因子RLM1(蛋白激酶C介导的MAP激酶途径的一个组成部分)的结合相关的DNA结构基序。