Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
Genomics. 2010 Apr;95(4):185-95. doi: 10.1016/j.ygeno.2010.01.002. Epub 2010 Jan 15.
Sequence-specific binding by transcription factors (TFs) interprets regulatory information encoded in the genome. Using recently published universal protein binding microarray (PBM) data on the in vitro DNA binding preferences of these proteins for all possible 8-base-pair sequences, we examined the evolutionary conservation and enrichment within putative regulatory regions of the binding sequences of a diverse library of 104 nonredundant mouse TFs spanning 22 different DNA-binding domain structural classes. We found that not only high affinity binding sites, but also numerous moderate and low affinity binding sites, are under negative selection in the mouse genome. These 8-mers occur preferentially in putative regulatory regions of the mouse genome, including CpG islands and non-exonic ultraconserved elements (UCEs). Of TFs whose PBM "bound" 8-mers are enriched within sets of tissue-specific UCEs, many are expressed in the same tissue(s) as the UCE-driven gene expression. Phylogenetically conserved motif occurrences of various TFs were also enriched in the noncoding sequence surrounding numerous gene sets corresponding to Gene Ontology categories and tissue-specific gene expression clusters, suggesting involvement in transcriptional regulation of those genes. Altogether, our results indicate that many of the sequences bound by these proteins in vitro, including lower affinity DNA sequences, are likely to be functionally important in vivo. This study not only provides an initial analysis of the potential regulatory associations of 104 mouse TFs, but also presents an approach for the functional analysis of TFs from any other metazoan genome as their DNA binding preferences are determined by PBMs or other technologies.
转录因子(TFs)通过序列特异性结合来解释基因组中编码的调控信息。利用最近发表的关于这些蛋白质在体外对所有可能的 8 碱基对序列的 DNA 结合偏好的通用蛋白质结合微阵列(PBM)数据,我们研究了跨越 22 个不同 DNA 结合域结构类别的 104 种非冗余小鼠 TF 的多样化文库的结合序列在假定的调控区域中的进化保守性和富集。我们发现,不仅高亲和力结合位点,而且许多中等和低亲和力结合位点,在小鼠基因组中都受到负选择。这些 8 -mer 优先出现在小鼠基因组的假定调控区域,包括 CpG 岛和非外显子超保守元件(UCEs)。在 PBM“结合”8-mer 的 TF 中,其富集在组织特异性 UCE 集合内的,许多在与 UCE 驱动的基因表达相同的组织中表达。各种 TF 的系统发育保守基序发生也在与基因本体论类别和组织特异性基因表达簇相对应的许多基因集的周围非编码序列中富集,表明它们参与了这些基因的转录调控。总之,我们的结果表明,这些蛋白质在体外结合的许多序列,包括低亲和力 DNA 序列,在体内可能具有重要的功能。这项研究不仅对 104 种小鼠 TF 的潜在调控关联进行了初步分析,而且还提出了一种方法,用于分析任何其他后生动物基因组中的 TF 的功能,因为它们的 DNA 结合偏好是通过 PBM 或其他技术确定的。