College of Life Sciences, Northwest A&F University, Taicheng Road, Yangling, 712100, China.
College of Information Engineering, Northwest A&F University, Taicheng Road, Yangling, 712100, China.
Interdiscip Sci. 2022 Sep;14(3):786-794. doi: 10.1007/s12539-022-00525-z. Epub 2022 May 28.
Enhancer-Promoter Interactions (EPIs) is an essential step in the gene regulation process. However, the detection of EPIs by traditional wet experimental techniques is time-consuming and expensive. Thus, computational methods would be very useful for understanding the mechanism of EPIs. A number of approaches have been proposed to address this problem. Nevertheless, there is room for exploration and improvement for the existing research methods.
In this study, a novel deep-learning model named EPI-Mind was proposed to predict EPIs with sequences features. First, we encoded enhancers and promoters sequences with pre-trained DNA vectors. Then, the Convolutional Neural Network (CNN) was utilized to rough extract the global and local features. Finally, the transformer mechanism was introduced to further extract the feature. We first trained a model named EPI-Mind_spe which can predict EPIs in one cell line. To achieve general prediction across different cell lines and further improve the performance of the model, a second-time training was carried on. The redivided dataset were used to train a new model called EPI-Mind_gen which can predict EPIs across different cell lines. To further improve the accuracy, a new model named EPI-Mind_best was trained which used the EPI-Mind_gen as a pre-trained model.
EPI-Mind_spe has the ability of predict EPIs with average AUROC above 90% and average AUPR above 70% but with cell lines specificity. EPI-Mind_gen can predict EPIs across different cell lines and its average AUROC is higher than the EPI-Mind_spe about 4.8%. EPI-Mind_best is superior to the state-of-the-art predictors on benchmarking datasets. EPI-Mind_best achieved best in 5 indicators within 12 indicators consists of AUPR and AUROC which is better than pioneers.
This research proposed a method, which was called EPI-Mind, to predict EPIs only with enhancer and promoters sequences, the framework of which was based on deep learning. This manuscript may provide a new route to solve the problem.
增强子-启动子相互作用(EPIs)是基因调控过程中的一个重要步骤。然而,传统的湿实验技术检测 EPIs 既耗时又昂贵。因此,计算方法对于理解 EPIs 的机制非常有用。已经提出了许多方法来解决这个问题。然而,现有的研究方法还有探索和改进的空间。
在这项研究中,提出了一种名为 EPI-Mind 的新型深度学习模型,该模型使用序列特征来预测 EPIs。首先,我们使用预先训练的 DNA 向量对增强子和启动子序列进行编码。然后,卷积神经网络(CNN)用于粗略提取全局和局部特征。最后,引入了转换器机制以进一步提取特征。我们首先训练了一个可以在一种细胞系中预测 EPIs 的模型,称为 EPI-Mind_spe。为了实现跨不同细胞系的一般预测,并进一步提高模型的性能,进行了第二次训练。重新划分的数据集用于训练一个新的模型,称为 EPI-Mind_gen,可以跨不同细胞系预测 EPIs。为了进一步提高准确性,使用 EPI-Mind_gen 作为预训练模型训练了一个新的模型,称为 EPI-Mind_best。
EPI-Mind_spe 具有预测 EPIs 的能力,平均 AUROC 高于 90%,平均 AUPR 高于 70%,但具有细胞系特异性。EPI-Mind_gen 可以跨不同细胞系预测 EPIs,其平均 AUROC 比 EPI-Mind_spe 高约 4.8%。EPI-Mind_best 在基准数据集上优于最先进的预测器。EPI-Mind_best 在由 AUPR 和 AUROC 组成的 12 个指标中的 5 个指标中均表现最佳,优于先驱者。
本研究提出了一种仅使用增强子和启动子序列预测 EPIs 的方法,称为 EPI-Mind,该方法基于深度学习。本文可能为解决该问题提供了一条新途径。