Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA.
BMC Genomics. 2011 Nov 25;12:578. doi: 10.1186/1471-2164-12-578.
Cis-regulatory modules are bound by transcription factors to regulate gene expression. Characterizing these DNA sequences is central to understanding gene regulatory networks and gaining insight into mechanisms of transcriptional regulation, but genome-scale regulatory module discovery remains a challenge. One popular approach is to scan the genome for clusters of transcription factor binding sites, especially those conserved in related species. When such approaches are successful, it is typically assumed that the activity of the modules is mediated by the identified binding sites and their cognate transcription factors. However, the validity of this assumption is often not assessed.
We successfully predicted five new cis-regulatory modules by combining binding site identification with sequence conservation and compared these to unsuccessful predictions from a related approach not utilizing sequence conservation. Despite greatly improved predictive success, the positive set had similar degrees of sequence and binding site conservation as the negative set. We explored the reasons for this by mutagenizing putative binding sites in three cis-regulatory modules. A large proportion of the tested sites had little or no demonstrable role in mediating regulatory element activity. Examination of loss-of-function mutants also showed that some transcription factors supposedly binding to the modules are not required for their function.
Our results raise important questions about interpreting regulatory module predictions obtained by finding clusters of conserved binding sites. Attribution of function to these sites and their cognate transcription factors may be incorrect even when modules are successfully identified. Our study underscores the importance of empirical validation of computational results even when these results are in line with expectation.
顺式调控模块通过转录因子结合来调节基因表达。对这些 DNA 序列进行特征分析是理解基因调控网络和深入了解转录调控机制的核心,但全基因组规模的调控模块发现仍然是一个挑战。一种流行的方法是扫描基因组中转录因子结合位点的簇,特别是那些在相关物种中保守的结合位点。当这种方法成功时,通常假设模块的活性是由鉴定的结合位点及其同源转录因子介导的。然而,这种假设的有效性通常没有得到评估。
我们通过将结合位点识别与序列保守性相结合,成功地预测了五个新的顺式调控模块,并将其与不利用序列保守性的相关方法的不成功预测进行了比较。尽管预测的成功率大大提高,但阳性集与阴性集的序列和结合位点保守程度相似。我们通过在三个顺式调控模块中突变假定的结合位点来探索这种情况的原因。大量测试的位点在介导调控元件活性方面几乎没有或没有明显的作用。对功能缺失突变体的检查还表明,一些据称与模块结合的转录因子对于它们的功能不是必需的。
我们的结果对通过寻找保守结合位点簇来获得的调控模块预测提出了重要的问题。即使成功识别了模块,将功能归因于这些位点及其同源转录因子也可能是不正确的。我们的研究强调了即使这些结果符合预期,对计算结果进行实证验证的重要性。