Hughes J D, Estep P W, Tavazoie S, Church G M
Department of Genetics, Harvard Medical School, 200 Longwood Ave, Boston, MA 02115, USA.
J Mol Biol. 2000 Mar 10;296(5):1205-14. doi: 10.1006/jmbi.2000.3519.
AlignACE is a Gibbs sampling algorithm for identifying motifs that are over-represented in a set of DNA sequences. When used to search upstream of apparently coregulated genes, AlignACE finds motifs that often correspond to the DNA binding preferences of transcription factors. We previously used AlignACE to analyze whole genome mRNA expression data. Here, we present a more detailed study of its effectiveness as applied to a variety of groups of genes in the Saccharomyces cerevisiae genome. Published functional catalogs of genes and sets of genes grouped by common name provided 248 groups, resulting in 3311 motifs. In conjunction with this analysis, we present measures for gauging the tendency of a motif to target a given set of genes relative to all other genes in the genome and for gauging the degree to which a motif is preferentially located in a certain distance range upstream of translational start sites. We demonstrate improved methods for comparing and clustering sequence motifs. Many previously identified cis-regulatory elements were found. We also describe previously unidentified motifs, one of which has been verified by experiments in our laboratory. An extensive set of AlignACE runs on randomly selected sets of genes and on sets of genes whose upstream regions contain known transcription factor binding sites serve as controls.
AlignACE是一种用于识别在一组DNA序列中过度表达的基序的吉布斯采样算法。当用于搜索明显共调控基因的上游时,AlignACE会找到通常与转录因子的DNA结合偏好相对应的基序。我们之前使用AlignACE来分析全基因组mRNA表达数据。在这里,我们对其应用于酿酒酵母基因组中各种基因组的有效性进行了更详细的研究。已发表的基因功能目录和按通用名称分组的基因集提供了248个组,产生了3311个基序。结合该分析,我们提出了一些衡量标准,用于衡量一个基序相对于基因组中所有其他基因靶向给定基因集的倾向,以及衡量一个基序优先位于翻译起始位点上游特定距离范围内的程度。我们展示了用于比较和聚类序列基序的改进方法。发现了许多先前鉴定的顺式调控元件。我们还描述了先前未鉴定的基序,其中一个已在我们实验室通过实验验证。在随机选择的基因集和其上游区域包含已知转录因子结合位点的基因集上进行的大量AlignACE运行作为对照。