Peng Pei-Chen, Sinha Saurabh
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Nucleic Acids Res. 2016 Jul 27;44(13):e120. doi: 10.1093/nar/gkw446. Epub 2016 Jun 1.
Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.
由调控序列驱动的基因表达水平预测在基因组生物学中至关重要。转录调控的一个主要重点是序列到表达的建模,该模型基于转录因子浓度和DNA结合特异性来解释增强子序列,并预测不同细胞环境下的精确基因表达水平。此类模型很大程度上依赖于用于DNA结合的位置权重矩阵(PWM)模型,而基于DNA形状的替代模型的效果仍未得到探索。在此,我们提出了一种利用结合位点的DNA形状特征的基因表达统计热力学模型。我们使用严格的方法评估了调控果蝇胚胎中空间基因表达模式的37个增强子的表达读数拟合情况,并表明基于DNA形状的模型表现可能优于基于PWM的模型。我们还观察到DNA形状捕获了与PWM互补的信息,这对表达建模很有用。此外,我们测试了将基于形状和PWM的特征相结合是否比单独使用任何一种结合模型能提供更好的预测。我们的工作表明,越来越流行的基于局部DNA形状的DNA结合模型在序列到表达建模中可能有用。它还为未来的研究提供了一个框架,以比单独使用PWM模型更好地预测基因表达。