UK Dementia Research Institute at Imperial College London, London, W12 0BZ, UK.
Department of Brain Sciences, Imperial College London, London, W12 0BZ, UK.
Nat Commun. 2024 Nov 16;15(1):9951. doi: 10.1038/s41467-024-54441-5.
Understanding how genetic variants affect the epigenome is key to interpreting GWAS, yet profiling these effects across the non-coding genome remains challenging due to experimental scalability. This necessitates accurate computational models. Existing machine learning approaches, while progressively improving, are confined to the cell types they were trained on, limiting their applicability. Here, we introduce Enformer Celltyping, a deep learning model which incorporates distal effects of DNA interactions, up to 100,000 base-pairs away, to predict epigenetic signals in previously unseen cell types. Using DNA and chromatin accessibility data for epigenetic imputation, Enformer Celltyping outperforms current best-in-class approaches and generalises across cell types and biological regions. Moreover, we propose a framework for evaluating models on genetic variant effect prediction using regulatory quantitative trait loci mapping studies, highlighting current limitations in genomic deep learning models. Despite this, Enformer Celltyping can also be used to study cell type-specific genetic enrichment of complex traits.
了解遗传变异如何影响表观基因组对于解释 GWAS 至关重要,但由于实验的可扩展性,对非编码基因组中的这些影响进行描绘仍然具有挑战性。这需要准确的计算模型。现有的机器学习方法虽然在不断改进,但仅限于它们所训练的细胞类型,限制了它们的适用性。在这里,我们介绍了 Enformer Celltyping,这是一种深度学习模型,它将 DNA 相互作用的远端效应纳入其中,距离可达 100,000 个碱基,以预测以前未见的细胞类型中的表观遗传信号。使用 DNA 和染色质可及性数据进行表观遗传推断,Enformer Celltyping 的表现优于当前的最佳方法,并在细胞类型和生物区域上具有通用性。此外,我们提出了一种使用调控数量性状基因座作图研究评估模型在遗传变异效应预测方面的框架,突出了基因组深度学习模型目前的局限性。尽管如此,Enformer Celltyping 也可用于研究复杂性状的特定于细胞类型的遗传富集。