Gladstone Institutes, San Francisco, CA 94158, USA.
Gladstone Institutes, San Francisco, CA 94158, USA; Department of Pediatrics and Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA 94158, USA.
Cell Syst. 2019 Jan 23;8(1):27-42.e6. doi: 10.1016/j.cels.2018.12.001. Epub 2019 Jan 16.
DNA shape adds specificity to sequence motifs but has not been explored systematically outside this context. We hypothesized that DNA-binding proteins (DBPs) preferentially occupy DNA with specific structures ("shape motifs") regardless of whether or not these correspond to high information content sequence motifs. We present ShapeMF, a Gibbs sampling algorithm that identifies de novo shape motifs. Using binding data from hundreds of in vivo and in vitro experiments, we show that most DBPs have shape motifs and can occupy these in the absence of sequence motifs. This "shape-only binding" is common for many DBPs and in regions co-bound by multiple DBPs. When shape and sequence motifs co-occur, they can be overlapping, flanking, or separated by consistent spacing. Finally, DBPs within the same protein family have different shape motifs, explaining their distinct genome-wide occupancy despite having similar sequence motifs. These results suggest that shape motifs not only complement sequence motifs but also facilitate recognition of DNA beyond conventionally defined sequence motifs.
DNA 构象为序列基序添加了特异性,但在这种背景之外,其并未被系统地探索。我们假设,无论这些基序是否对应于高信息含量的序列基序,DNA 结合蛋白 (DBP) 都会优先占据具有特定结构的 DNA(“构象基序”)。我们提出了 ShapeMF,这是一种 Gibbs 抽样算法,可用于识别从头开始的构象基序。通过数百项体内和体外实验的结合数据,我们表明,大多数 DBP 都具有构象基序,并且可以在没有序列基序的情况下占据这些基序。这种“仅构象结合”对于许多 DBP 以及由多个 DBP 共同结合的区域都是常见的。当构象和序列基序同时出现时,它们可以重叠、相邻或通过一致的间隔隔开。最后,同一蛋白质家族内的 DBP 具有不同的构象基序,尽管它们具有相似的序列基序,但这解释了它们在整个基因组上的不同占据情况。这些结果表明,构象基序不仅补充了序列基序,而且有助于识别超出传统定义的序列基序的 DNA。