Khetan Shubham, Carroll Brent S, Bulyk Martha L
Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
Nature. 2025 Sep 3. doi: 10.1038/s41586-025-09472-3.
Transcription factors (TFs) regulate gene expression by interacting with DNA in a sequence-specific manner. High-throughput in vitro technologies, such as protein-binding microarrays and HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment), have revealed the DNA-binding specificities of hundreds of TFs. However, they have limited ability to reliably identify lower-affinity DNA binding sites, which are increasingly recognized as important for precise spatiotemporal control of gene expression. Here, to address this limitation, we developed protein affinity to DNA by in vitro transcription and RNA sequencing (PADIT-seq), with which we comprehensively assayed the binding preferences of six TFs to all possible ten-base-pair DNA sequences, detecting hundreds of novel, lower-affinity binding sites. The expanded repertoire of lower-affinity binding sites revealed that nucleotides flanking high-affinity DNA binding sites create overlapping lower-affinity sites that together modulate TF genomic occupancy in vivo. We propose a model in which TF binding is not determined by individual binding sites, but rather by the sum of multiple, overlapping binding sites. The overlapping binding model explains how competition between paralogous TFs for shared high-affinity binding sites is determined by flanking nucleotides that create differential numbers of overlapping, lower-affinity binding sites. Critically, the model transforms our understanding of noncoding-variant effects, revealing how single nucleotide changes simultaneously alter multiple overlapping sites to additively influence gene expression and human traits, including diseases.
转录因子(TFs)通过以序列特异性方式与DNA相互作用来调节基因表达。高通量体外技术,如蛋白质结合微阵列和HT-SELEX(指数富集配体的高通量系统进化),已经揭示了数百种转录因子的DNA结合特异性。然而,它们可靠识别低亲和力DNA结合位点的能力有限,而这些位点对于基因表达的精确时空控制越来越重要。在这里,为了解决这一局限性,我们开发了体外转录和RNA测序检测蛋白质与DNA亲和力(PADIT-seq)技术,通过该技术我们全面测定了六种转录因子对所有可能的十碱基对DNA序列的结合偏好,检测到数百个新的、低亲和力的结合位点。低亲和力结合位点的扩展库表明,高亲和力DNA结合位点侧翼的核苷酸会产生重叠的低亲和力位点,这些位点共同调节体内转录因子的基因组占据情况。我们提出了一个模型,其中转录因子结合不是由单个结合位点决定的,而是由多个重叠结合位点的总和决定的。重叠结合模型解释了同源转录因子对共享高亲和力结合位点的竞争是如何由侧翼核苷酸决定的,这些侧翼核苷酸会产生不同数量的重叠低亲和力结合位点。至关重要的是,该模型改变了我们对非编码变异效应的理解,揭示了单核苷酸变化如何同时改变多个重叠位点,以累加方式影响基因表达和人类性状,包括疾病。