Gao Vianne R, Yang Rui, Das Arnav, Luo Renhe, Luo Hanzhi, McNally Dylan R, Karagiannidis Ioannis, Rivas Martin A, Wang Zhong-Min, Barisic Darko, Karbalayghareh Alireza, Wong Wilfred, Zhan Yingqian A, Chin Christopher R, Noble William, Bilmes Jeff A, Apostolou Effie, Kharas Michael G, Béguelin Wendy, Viny Aaron D, Huangfu Danwei, Rudensky Alexander Y, Melnick Ari M, Leslie Christina S
Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA.
bioRxiv. 2023 Jul 28:2023.07.27.550836. doi: 10.1101/2023.07.27.550836.
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
鉴定调控元件之间细胞类型特异性的三维染色质相互作用,有助于解读基因调控并阐释疾病相关非编码变异的功能。然而,当仅有少量细胞可作为输入时,当前的染色体构象捕获(3C)技术无法在该分辨率下解析相互作用。因此,我们提出了ChromaFold,这是一种深度学习模型,仅根据单细胞染色质转座酶可及性测序(scATAC-seq)数据就能预测三维接触图谱和调控相互作用。ChromaFold将伪批量染色质可及性、跨元细胞的共可及性图谱以及预测的CTCF基序轨迹作为输入特征,并采用轻量级架构以便在标准GPU上进行训练。一旦在人类细胞系和组织中的配对scATAC-seq和Hi-C数据上进行训练,ChromaFold就能准确预测不同人类和小鼠测试细胞类型的三维接触图谱和峰值水平的相互作用。在与一种使用批量ATAC-seq、DNA序列和CTCF染色质免疫沉淀测序(ChIP-seq)进行细胞类型特异性预测的最新深度学习方法进行基准测试时,当将CTCF ChIP-seq数据作为输入时,ChromaFold具有更优的预测性能,而在不使用该数据时性能相当。最后,在复杂组织中的配对scATAC-seq和Hi-C数据上对ChromaFold进行微调,能够反卷积跨细胞亚群的染色质相互作用。因此,ChromaFold仅使用scATAC-seq作为输入数据就能实现三维接触图谱和调控相互作用的最先进预测,从而在基于3C的检测不可行的情况下,能够准确推断细胞类型特异性相互作用。