National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Genome Res. 2010 May;20(5):565-77. doi: 10.1101/gr.104471.109. Epub 2010 Apr 2.
Clustering of multiple transcription factor binding sites (TFBSs) for the same transcription factor (TF) is a common feature of cis-regulatory modules in invertebrate animals, but the occurrence of such homotypic clusters of TFBSs (HCTs) in the human genome has remained largely unknown. To explore whether HCTs are also common in human and other vertebrates, we used known binding motifs for vertebrate TFs and a hidden Markov model-based approach to detect HCTs in the human, mouse, chicken, and fugu genomes, and examined their association with cis-regulatory modules. We found that evolutionarily conserved HCTs occupy nearly 2% of the human genome, with experimental evidence for individual TFs supporting their binding to predicted HCTs. More than half of the promoters of human genes contain HCTs, with a distribution around the transcription start site in agreement with the experimental data from the ENCODE project. In addition, almost half of the 487 experimentally validated developmental enhancers contain them as well--a number more than 25-fold larger than expected by chance. We also found evidence of negative selection acting on TFBSs within HCTs, as the conservation of TFBSs is stronger than the conservation of sequences separating them. The important role of HCTs as components of developmental enhancers is additionally supported by a strong correlation between HCTs and the binding of the enhancer-associated coactivator protein Ep300 (also known as p300). Experimental validation of HCT-containing elements in both zebrafish and mouse suggest that HCTs could be used to predict both the presence of enhancers and their tissue specificity, and are thus a feature that can be effectively used in deciphering the gene regulatory code. In conclusion, our results indicate that HCTs are a pervasive feature of human cis-regulatory modules and suggest that they play an important role in gene regulation in the human and other vertebrate genomes.
多个转录因子(TF)结合位点(TFBS)的聚类对于无脊椎动物的顺式调控模块是一种常见特征,但在人类基因组中是否存在这种同型 TFBS 聚类(HCT)仍然很大程度上未知。为了探索 HCT 是否也存在于人类和其他脊椎动物中,我们使用了已知的脊椎动物 TF 结合基序和基于隐马尔可夫模型的方法,在人类、小鼠、鸡和河豚基因组中检测 HCT,并检查它们与顺式调控模块的关联。我们发现,进化保守的 HCT 占据了人类基因组的近 2%,并且有实验证据表明,个别 TF 支持它们与预测的 HCT 结合。人类基因的启动子中有一半以上包含 HCT,其在转录起始位点周围的分布与 ENCODE 项目的实验数据一致。此外,几乎一半的 487 个经过实验验证的发育增强子也包含它们,这一数字是随机预期的 25 倍以上。我们还发现,HCT 内的 TFBS 受到负选择的作用,因为 TFBS 的保守性强于分隔它们的序列的保守性。HCT 作为发育增强子成分的重要作用还得到了 HCT 与增强子相关共激活蛋白 Ep300(也称为 p300)结合之间的强相关性的支持。在斑马鱼和小鼠中对包含 HCT 的元件的实验验证表明,HCT 可用于预测增强子的存在及其组织特异性,因此是可以有效用于破译基因调控代码的特征。总之,我们的结果表明,HCT 是人类顺式调控模块的普遍特征,并表明它们在人类和其他脊椎动物基因组的基因调控中发挥着重要作用。