Department of Statistics, University of California, Los Angeles, CA 90095, USA.
Bioinformatics. 2010 Nov 15;26(22):2826-32. doi: 10.1093/bioinformatics/btq546. Epub 2010 Sep 23.
DNA binding proteins play crucial roles in the regulation of gene expression. Transcription factors (TFs) activate or repress genes directly while other proteins influence chromatin structure for transcription. Binding sites of a TF exhibit a similar sequence pattern called a motif. However, a one-to-one map does not exist between each TF and motif. Many TFs in a protein family may recognize the same motif with subtle nucleotide differences leading to different binding affinities. Additionally, a particular TF may bind different motifs under certain conditions, for example in the presence of different co-regulators. The availability of genome-wide binding data of multiple collaborative TFs makes it possible to detect such context-dependent motifs.
We developed a contrast motif finder (CMF) for the de novo identification of motifs that are differentially enriched in two sets of sequences. Applying this method to a number of TF binding datasets from mouse embryonic stem cells, we demonstrate that CMF achieves substantially higher accuracy than several well-known motif finding methods. By contrasting sequences bound by distinct sets of TFs, CMF identified two different motifs that may be recognized by Oct4 dependent on the presence of another co-regulator and detected subtle motif signals that may be associated with potential competitive binding between Sox2 and Tcf3.
The software CMF is freely available for academic use at www.stat.ucla.edu/∼zhou/CMF.
DNA 结合蛋白在基因表达调控中起着至关重要的作用。转录因子 (TF) 直接激活或抑制基因,而其他蛋白质则影响转录的染色质结构。TF 的结合位点表现出相似的序列模式,称为基序。然而,每个 TF 和基序之间并不存在一一对应的关系。许多蛋白质家族中的 TF 可能识别具有细微核苷酸差异的相同基序,从而导致不同的结合亲和力。此外,特定的 TF 在某些条件下可能结合不同的基序,例如在存在不同的共调节剂的情况下。多个协作 TF 的全基因组结合数据的可用性使得检测这种上下文相关的基序成为可能。
我们开发了一种对比基序发现器 (CMF),用于在两组序列中差异富集的基序的从头识别。将该方法应用于来自小鼠胚胎干细胞的多个 TF 结合数据集,我们证明 CMF 比几种知名的基序发现方法具有更高的准确性。通过对比由不同 TF 集合结合的序列,CMF 识别了两个可能由 Oct4 识别的不同基序,具体取决于另一个共调节剂的存在,并检测到可能与 Sox2 和 Tcf3 之间潜在竞争结合相关的细微基序信号。
软件 CMF 可在 www.stat.ucla.edu/∼zhou/CMF 上免费供学术使用。