Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan 48824, USA.
Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824, USA.
Toxicol Sci. 2023 Nov 28;196(2):170-186. doi: 10.1093/toxsci/kfad094.
The aryl hydrocarbon receptor (AhR) is an inducible transcription factor whose ligands include the potent environmental contaminant 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Ligand-activated AhR binds to DNA at dioxin response elements (DREs) containing the core motif 5'-GCGTG-3'. However, AhR binding is highly tissue specific. Most DREs in accessible chromatin are not bound by TCDD-activated AhR, and DREs accessible in multiple tissues can be bound in some and unbound in others. As such, AhR functions similarly to many nuclear receptors. Given that AhR possesses a strong core motif, it is suited for a motif-centered analysis of its binding. We developed interpretable machine learning models predicting the AhR binding status of DREs in MCF-7, GM17212, and HepG2 cells, as well as primary human hepatocytes. Cross-tissue models predicting transcription factor (TF)-DNA binding generally perform poorly. However, reasons for the low performance remain unexplored. By interpreting the results of individual within-tissue models and by examining the features leading to low cross-tissue performance, we identified sequence and chromatin context patterns correlated with AhR binding. We conclude that AhR binding is driven by a complex interplay of tissue-agnostic DRE flanking DNA sequence and tissue-specific local chromatin context. Additionally, we demonstrate that interpretable machine learning models can provide novel and experimentally testable mechanistic insights into DNA binding by inducible TFs.
芳香烃受体 (AhR) 是一种诱导型转录因子,其配体包括强效环境污染物 2,3,7,8-四氯二苯并对二恶英 (TCDD)。配体激活的 AhR 与二恶英反应元件 (DRE) 中的 DNA 结合,该元件包含核心基序 5'-GCGTG-3'。然而,AhR 结合具有高度的组织特异性。可及染色质中的大多数 DRE 不与 TCDD 激活的 AhR 结合,并且在多种组织中可及的 DRE 可以在某些组织中结合,而在其他组织中不结合。因此,AhR 的功能类似于许多核受体。鉴于 AhR 具有很强的核心基序,因此非常适合对其结合进行基于基序的分析。我们开发了可解释的机器学习模型,可预测 MCF-7、GM17212 和 HepG2 细胞以及原代人肝细胞中 DRE 的 AhR 结合状态。跨组织模型预测转录因子 (TF)-DNA 结合的性能通常较差。然而,低性能的原因仍未得到探索。通过解释个体组织内模型的结果,并检查导致跨组织性能低下的特征,我们确定了与 AhR 结合相关的序列和染色质背景模式。我们得出的结论是,AhR 结合是由无组织特异性 DRE 侧翼 DNA 序列和组织特异性局部染色质背景的复杂相互作用驱动的。此外,我们证明了可解释的机器学习模型可以为诱导型 TF 的 DNA 结合提供新颖且可通过实验验证的机制见解。