CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
CNRS, Univ. Grenoble Alpes, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
Mol Plant. 2019 Jun 3;12(6):743-763. doi: 10.1016/j.molp.2018.10.010. Epub 2018 Nov 15.
Transcription factors (TFs) are key cellular components that control gene expression. They recognize specific DNA sequences, the TF binding sites (TFBSs), and thus are targeted to specific regions of the genome where they can recruit transcriptional co-factors and/or chromatin regulators to fine-tune spatiotemporal gene regulation. Therefore, the identification of TFBSs in genomic sequences and their subsequent quantitative modeling is of crucial importance for understanding and predicting gene expression. Here, we review how TFBSs can be determined experimentally, how the TFBS models can be constructed in silico, and how they can be optimized by taking into account features such as position interdependence within TFBSs, DNA shape, and/or by introducing state-of-the-art computational algorithms such as deep learning methods. In addition, we discuss the integration of context variables into the TFBS modeling, including nucleosome positioning, chromatin states, methylation patterns, 3D genome architectures, and TF cooperative binding, in order to better predict TF binding under cellular contexts. Finally, we explore the possibilities of combining the optimized TFBS model with technological advances, such as targeted TFBS perturbation by CRISPR, to better understand gene regulation, evolution, and plant diversity.
转录因子 (TFs) 是控制基因表达的关键细胞成分。它们识别特定的 DNA 序列,即 TF 结合位点 (TFBSs),因此被靶向到基因组的特定区域,在这些区域中,它们可以招募转录共因子和/或染色质调节剂来精细调节时空基因调控。因此,鉴定基因组序列中的 TFBS 及其随后的定量建模对于理解和预测基因表达至关重要。在这里,我们回顾了如何通过实验确定 TFBS,如何通过计算构建 TFBS 模型,以及如何通过考虑 TFBS 内的位置相互依赖性、DNA 形状和/或引入深度学习等最先进的计算算法来优化它们。此外,我们还讨论了将上下文变量纳入 TFBS 建模中,包括核小体定位、染色质状态、甲基化模式、三维基因组结构和 TF 协同结合,以更好地预测细胞环境下的 TF 结合。最后,我们探讨了将优化后的 TFBS 模型与技术进步相结合的可能性,例如通过 CRISPR 靶向 TFBS 扰动,以更好地理解基因调控、进化和植物多样性。