Gao Zijing, Liu Qiao, Zeng Wanwen, Jiang Rui, Wong Wing Hung
Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China.
Department of Statistics, Stanford University, CA, Stanford, 94305, USA.
Genome Biol. 2024 Dec 18;25(1):310. doi: 10.1186/s13059-024-03449-7.
The inherent similarities between natural language and biological sequences have inspired the use of large language models in genomics, but current models struggle to incorporate chromatin interactions or predict in unseen cellular contexts. To address this, we propose EpiGePT, a transformer-based model designed for predicting context-specific human epigenomic signals. By incorporating transcription factor activities and 3D genome interactions, EpiGePT outperforms existing methods in epigenomic signal prediction tasks, especially in cell-type-specific long-range interaction predictions and genetic variant impacts, advancing our understanding of gene regulation. A free online prediction service is available at http://health.tsinghua.edu.cn/epigept .
自然语言与生物序列之间的内在相似性激发了人们在基因组学中使用大语言模型的想法,但目前的模型在整合染色质相互作用或预测未知细胞环境方面存在困难。为了解决这一问题,我们提出了EpiGePT,这是一种基于Transformer的模型,旨在预测特定背景下的人类表观基因组信号。通过整合转录因子活性和三维基因组相互作用,EpiGePT在表观基因组信号预测任务中优于现有方法,尤其是在细胞类型特异性的长程相互作用预测和基因变异影响方面,加深了我们对基因调控的理解。可通过http://health.tsinghua.edu.cn/epigept获得免费的在线预测服务。