Sigalova Olga M, Forneris Mattia, Stojanovska Frosina, Zhao Bingqing, Viales Rebecca R, Rabinowitz Adam, Hammal Fayrouz, Ballester Benoît, Zaugg Judith B, Furlong Eileen E M
European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany.
European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, D-69117 Heidelberg, Germany.
Genome Res. 2025 May 2;35(5):1138-1153. doi: 10.1101/gr.279652.124.
Understanding how genetic variation impacts transcription factor (TF) binding remains a major challenge, limiting our ability to model disease-associated variants. Here, we used a highly controlled system of F crosses with extensive genetic diversity to profile allele-specific binding of four TFs at several time points during embryogenesis. Using a combined haplotype test, we identified 9%-18% of TF-bound regions impacted by genetic variation even for essential regulators. By expanding WASP (a tool for allele-specific read mapping) to examine indels, we increased detection of allelically imbalanced peaks by 30%-50%. This fine-grained "mutagenesis" can reconstruct functionalized binding motifs for all factors. To prioritize causal variants, we trained a convolutional neural network (Basenji) to accurately predict binding from DNA sequence. The model can also predict measured allelic imbalance for strong effect variants, providing a mechanistic interpretation for how the variant impacts binding. This reveals unexpected relationships between TFs, including potential cooperative pairs, and mechanisms of tissue-specific recruitment of the ubiquitous factor CTCF.
了解基因变异如何影响转录因子(TF)结合仍然是一项重大挑战,限制了我们对疾病相关变异进行建模的能力。在此,我们使用了一个具有高度遗传多样性的F杂交高度可控系统,以分析四种转录因子在胚胎发育的几个时间点上的等位基因特异性结合。通过使用组合单倍型测试,我们发现即使对于关键调节因子,也有9%-18%的转录因子结合区域受到基因变异的影响。通过扩展WASP(一种用于等位基因特异性读段定位的工具)以检测插入缺失,我们将等位基因不平衡峰的检测率提高了30%-50%。这种精细的“诱变”可以为所有因子重建功能化的结合基序。为了对因果变异进行优先级排序,我们训练了一个卷积神经网络(Basenji),以根据DNA序列准确预测结合情况。该模型还可以预测强效应变异的测量等位基因不平衡情况,为变异如何影响结合提供了一种机制解释。这揭示了转录因子之间意想不到的关系,包括潜在的协同对,以及普遍存在的因子CTCF的组织特异性募集机制。