Ward Lucas D, Bussemaker Harmen J
Department of Biological Sciences, Columbia University, New York, NY 10027, USA.
Bioinformatics. 2008 Jul 1;24(13):i165-71. doi: 10.1093/bioinformatics/btn154.
The identification of transcription factor (TF) binding sites and the regulatory circuitry that they define is currently an area of intense research. Data from whole-genome chromatin immunoprecipitation (ChIP-chip), whole-genome expression microarrays, and sequencing of multiple closely related genomes have all proven useful. By and large, existing methods treat the interpretation of functional data as a classification problem (between bound and unbound DNA), and the analysis of comparative data as a problem of local alignment (to recover phylogenetic footprints of presumably functional elements). Both of these approaches suffer from the inability to model and detect low-affinity binding sites, which have recently been shown to be abundant and functional.
We have developed a method that discovers functional regulatory targets of TFs by predicting the total affinity of each promoter for those factors and then comparing that affinity across orthologous promoters in closely related species. At each promoter, we consider the minimum affinity among orthologs to be the fraction of the affinity that is functional. Because we calculate the affinity of the entire promoter, our method is independent of local alignment. By comparing with functional annotation information and gene expression data in Saccharomyces cerevisiae, we have validated that this biophysically motivated use of evolutionary conservation gives rise to dramatic improvement in prediction of regulatory connectivity and factor-factor interactions compared to the use of a single genome. We propose novel biological functions for several yeast TFs, including the factors Snt2 and Stb4, for which no function has been reported. Our affinity-based approach towards comparative genomics may allow a more quantitative analysis of the principles governing the evolution of non-coding DNA.
The MatrixREDUCE software package is available from http://www.bussemakerlab.org/software/MatrixREDUCE.
Supplementary data are available at Bioinformatics online.
转录因子(TF)结合位点及其所定义的调控网络的识别是当前一个研究热点领域。来自全基因组染色质免疫沉淀(ChIP-chip)、全基因组表达微阵列以及多个密切相关基因组测序的数据都已证明是有用的。总体而言,现有方法将功能数据的解释视为分类问题(区分结合和未结合的DNA),而将比较数据的分析视为局部比对问题(以恢复可能的功能元件的系统发育足迹)。这两种方法都存在无法对低亲和力结合位点进行建模和检测的问题,而最近已表明这类位点数量众多且具有功能。
我们开发了一种方法,通过预测每个启动子对这些因子的总亲和力,然后比较密切相关物种中直系同源启动子之间的亲和力,来发现TF的功能调控靶点。在每个启动子处,我们将直系同源物之间的最小亲和力视为具有功能活性的亲和力部分。由于我们计算的是整个启动子的亲和力。我们的方法独立于局部比对。通过与酿酒酵母中的功能注释信息和基因表达数据进行比较,我们已经验证,与使用单个基因组相比,这种基于生物物理学动机利用进化保守性对调控连接性和因子 - 因子相互作用的预测带来了显著改进。我们为几种酵母TF提出了新的生物学功能,包括尚未报道过功能的Snt2和Stb4因子。我们基于亲和力的比较基因组学方法可能允许对非编码DNA进化所遵循的原则进行更定量的分析。
MatrixREDUCE软件包可从http://www.bussemakerlab.org/software/MatrixREDUCE获取。
补充数据可在《生物信息学》在线获取。