Tan Jimin, Fu Xi, Ling Xinyu, Mo Shentong, Bai Jiangshan, Rabadán Raúl, Fenyö David, Boeke Jef D, Tsirigos Aristotelis, Xia Bo
Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY, USA.
bioRxiv. 2025 Aug 17:2025.08.17.670761. doi: 10.1101/2025.08.17.670761.
Chromatin-associated proteins (CAPs), including over 1,600 transcription factors, bind directly or indirectly to the genomic DNA to regulate gene expression and determine a myriad of cell types. Mapping their genome-wide binding and co-binding landscape is essential towards a mechanistic understanding of their functions in gene regulation and resulting cellular phenotypes. However, due to the lack of techniques that effectively scale across proteins and biological samples, their genome-wide binding profiles remain challenging to obtain, particularly in primary cells. Here we present Chromnitron, a multimodal foundation model that accurately predicts CAP binding landscapes across hundreds of proteins in unseen cell types. Via perturbation experiments, we show that the model learned principles of CAP binding from multimodal features including DNA sequence motifs, chromatin accessibility levels, and protein functional domains. Applying Chromnitron to study cell fate transitions, we discovered novel CAPs regulating the T cell exhaustion process. Furthermore, Chromnitron can predict the dynamic CAP binding landscapes during development, revealing the global orchestration of protein and regulatory element activities in neurogenesis. We expect Chromnitron to accelerate discovery and engineering in regulatory genomics, particularly in human primary cells, and empower future therapeutic opportunities.
染色质相关蛋白(CAPs),包括1600多种转录因子,直接或间接与基因组DNA结合,以调节基因表达并决定多种细胞类型。绘制它们在全基因组范围内的结合和共结合图谱,对于从机制上理解它们在基因调控和由此产生的细胞表型中的功能至关重要。然而,由于缺乏能有效适用于多种蛋白质和生物样本的技术,获取它们在全基因组范围内的结合图谱仍然具有挑战性,尤其是在原代细胞中。在此,我们展示了Chromnitron,这是一种多模态基础模型,能够准确预测在未见过的细胞类型中数百种蛋白质的CAP结合图谱。通过扰动实验,我们表明该模型从包括DNA序列基序、染色质可及性水平和蛋白质功能域在内的多模态特征中学习到了CAP结合的原理。将Chromnitron应用于研究细胞命运转变,我们发现了调节T细胞耗竭过程的新型CAPs。此外,Chromnitron可以预测发育过程中动态的CAP结合图谱,揭示神经发生过程中蛋白质和调控元件活动的全局协调。我们期望Chromnitron能加速调控基因组学的发现和工程研究,尤其是在人类原代细胞中,并为未来的治疗机会提供助力。