Department of Genetics, Washington University Medical School, St Louis, MO, USA.
Nucleic Acids Res. 2012 Jan;40(Database issue):D162-8. doi: 10.1093/nar/gkr1180. Epub 2011 Dec 2.
Saccharomyces cerevisiae is a primary model for studies of transcriptional control, and the specificities of most yeast transcription factors (TFs) have been determined by multiple methods. However, it is unclear which position weight matrices (PWMs) are most useful; for the roughly 200 TFs in yeast, there are over 1200 PWMs in the literature. To address this issue, we created ScerTF, a comprehensive database of 1226 motifs from 11 different sources. We identified a single matrix for each TF that best predicts in vivo data by benchmarking matrices against chromatin immunoprecipitation and TF deletion experiments. We also used in vivo data to optimize thresholds for identifying regulatory sites with each matrix. To correct for biases from different methods, we developed a strategy to combine matrices. These aligned matrices outperform the best available matrix for several TFs. We used the matrices to predict co-occurring regulatory elements in the genome and identified many known TF combinations. In addition, we predict new combinations and provide evidence of combinatorial regulation from gene expression data. The database is available through a web interface at http://ural.wustl.edu/ScerTF. The site allows users to search the database with a regulatory site or matrix to identify the TFs most likely to bind the input sequence.
酿酒酵母是转录调控研究的主要模式生物,大多数酵母转录因子(TF)的特异性已通过多种方法确定。然而,目前尚不清楚哪些位置权重矩阵(PWMs)最有用;在酵母中大约有 200 个 TF,文献中就有超过 1200 个 PWM。为了解决这个问题,我们创建了 ScerTF,这是一个综合数据库,包含 11 个不同来源的 1226 个基序。我们通过将矩阵与染色质免疫沉淀和 TF 缺失实验进行基准测试,为每个 TF 确定了一个最佳预测体内数据的单一矩阵。我们还使用体内数据为每个矩阵优化了识别调控位点的阈值。为了纠正来自不同方法的偏差,我们开发了一种组合矩阵的策略。这些对齐的矩阵在几个 TF 上的表现优于现有最佳矩阵。我们使用这些矩阵来预测基因组中共同出现的调控元件,并鉴定了许多已知的 TF 组合。此外,我们还预测了新的组合,并从基因表达数据中提供了组合调控的证据。该数据库可通过网络界面 http://ural.wustl.edu/ScerTF 访问。该网站允许用户使用调控位点或矩阵搜索数据库,以识别最有可能结合输入序列的 TF。