Department of Biological Sciences, Columbia University, New York, NY, USA.
Program in Applied Physics and Applied Mathematics, Columbia University, New York, NY, USA.
Mol Syst Biol. 2018 Feb 22;14(2):e7902. doi: 10.15252/msb.20177902.
Transcription factors (TFs) interpret DNA sequence by probing the chemical and structural properties of the nucleotide polymer. DNA shape is thought to enable a parsimonious representation of dependencies between nucleotide positions. Here, we propose a unified mathematical representation of the DNA sequence dependence of shape and TF binding, respectively, which simplifies and enhances analysis of shape readout. First, we demonstrate that linear models based on mononucleotide features alone account for 60-70% of the variance in minor groove width, roll, helix twist, and propeller twist. This explains why simple scoring matrices that ignore all dependencies between nucleotide positions can partially account for DNA shape readout by a TF Adding dinucleotide features as sequence-to-shape predictors to our model, we can almost perfectly explain the shape parameters. Building on this observation, we developed a analysis method that can be used to analyze any mechanism-agnostic protein-DNA binding model in terms of shape readout. Our insights provide an alternative strategy for using DNA shape information to enhance our understanding of how -regulatory codes are interpreted by the cellular machinery.
转录因子 (TFs) 通过探测核苷酸聚合物的化学和结构特性来解释 DNA 序列。人们认为 DNA 形状能够以简洁的方式表示核苷酸位置之间的依赖关系。在这里,我们分别提出了一种 DNA 序列依赖性的形状和 TF 结合的统一数学表示,它简化并增强了形状读取的分析。首先,我们证明了基于单核苷酸特征的线性模型可以解释小沟宽度、滚转、螺旋扭曲和旋桨扭曲的 60-70%的方差。这解释了为什么简单的评分矩阵忽略了核苷酸位置之间的所有依赖关系,仍然可以部分解释 TF 对 DNA 形状读取的作用。通过将二核苷酸特征作为序列到形状的预测因子添加到我们的模型中,我们几乎可以完全解释形状参数。基于这一观察,我们开发了一种分析方法,可以用于根据形状读取来分析任何与机制无关的蛋白质-DNA 结合模型。我们的见解为利用 DNA 形状信息提供了一种替代策略,以增强我们对细胞机制如何解释 -调控代码的理解。