Zhang Tianjiao, Zhao Xingjie, Sun Hao, Gao Bo, Liu Xiaoqi
College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150081, China.
Genes (Basel). 2024 Nov 25;15(12):1511. doi: 10.3390/genes15121511.
The enhancer-promoter interaction (EPI) is a critical component of gene regulatory networks, playing a significant role in understanding the complexity of gene expression. Traditional EPI prediction methods focus on one-to-one interactions, neglecting more complex one-to-many and many-to-many patterns. To address this gap, we utilize graph neural networks to comprehensively explore all interaction patterns between enhancers and promoters, capturing complex regulatory relationships for more accurate predictions.
In this study, we introduce a novel EPI prediction framework, GATv2EPI, based on dynamic graph attention neural networks. GATv2EPI leverages epigenetic information from enhancers, promoters, and their surrounding regions and organizes interactions into a network to comprehensively explore complex EPI regulatory patterns, including one-to-one, one-to-many, and many-to-many relationships. To avoid overfitting and ensure diverse data representation, we implemented a connectivity-based sampling method for dataset partitioning, which constructs graphs for each chromosome and assigns entire connected subgraphs to training or test sets, thereby preventing information leakage and ensuring comprehensive chromosomal representation.
In experiments conducted on four cell lines-NHEK, IMR90, HMEC, and K562-GATv2EPI demonstrated superior EPI recognition accuracy compared to existing similar methods, with a training time improvement of 95.29% over TransEPI.
GATv2EPI enhances EPI prediction accuracy by capturing complex topological structure information from gene regulatory networks through graph neural networks. Additionally, our results emphasize the importance of epigenetic features surrounding enhancers and promoters in EPI prediction.
增强子-启动子相互作用(EPI)是基因调控网络的关键组成部分,在理解基因表达的复杂性方面发挥着重要作用。传统的EPI预测方法侧重于一对一的相互作用,而忽略了更复杂的一对多和多对多模式。为了弥补这一差距,我们利用图神经网络全面探索增强子和启动子之间的所有相互作用模式,捕捉复杂的调控关系以进行更准确的预测。
在本研究中,我们基于动态图注意力神经网络引入了一种新颖的EPI预测框架GATv2EPI。GATv2EPI利用来自增强子、启动子及其周围区域的表观遗传信息,并将相互作用组织成一个网络,以全面探索复杂的EPI调控模式,包括一对一、一对多和多对多关系。为了避免过拟合并确保多样化的数据表示,我们为数据集划分实现了一种基于连通性的采样方法,该方法为每个染色体构建图,并将整个连通子图分配给训练集或测试集,从而防止信息泄露并确保全面的染色体表示。
在对四种细胞系——正常人表皮角质形成细胞(NHEK)、人胚肺成纤维细胞(IMR90)、人乳腺上皮细胞(HMEC)和人慢性髓系白血病细胞(K562)——进行的实验中,与现有的类似方法相比,GATv2EPI展现出了卓越的EPI识别准确率,与TransEPI相比训练时间缩短了95.29%。
GATv2EPI通过图神经网络从基因调控网络中捕捉复杂的拓扑结构信息,提高了EPI预测准确率。此外,我们的结果强调了增强子和启动子周围的表观遗传特征在EPI预测中的重要性。