Suppr超能文献

iEnhancer-GDM:一种基于生成对抗网络和多头注意力机制的深度学习框架,用于识别增强子及其强度。

iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.

作者信息

Yang Xiaomei, Liao Meng, Ye Bin, Xia Junfeng, Zhao Jianping

机构信息

College of Mathematics and System Sciences, Xinjiang University, Urumqi, 830017, China.

Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China.

出版信息

Interdiscip Sci. 2025 May 7. doi: 10.1007/s12539-025-00703-9.

Abstract

Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.

摘要

增强子是能够显著提高基因转录频率的短DNA片段。它们通常以顺式或反式构型在远距离对靶基因发挥作用。由于增强子的位置和敏感性各不相同,识别增强子具有挑战性。增强子区域内的基因变异与人类疾病有关,这突出了增强子识别和强度预测的至关重要性。在此,我们开发了一种名为iEnhancer-GDM的两层预测器,用于识别增强子并预测增强子强度。为了解决增强子训练数据集规模有限所带来的挑战,这些挑战可能导致模型过拟合和分类准确率低等问题,我们引入了瓦瑟斯坦生成对抗网络(WGAN-GP)来扩充数据集。我们采用dna2vec嵌入层将原始DNA序列编码为数值特征表示,然后整合多尺度卷积神经网络、双向长短期记忆网络和多头注意力机制进行特征表示和分类。我们的结果验证了WGAN-GP中数据增强的有效性。我们的模型iEnhancer-GDM在独立测试数据集上取得了优异的性能,通过与当前方法进行基准测试,在增强子识别方面比现有模型提高了2.45%,在增强子强度预测方面提高了11.5%。iEnhancer-GDM推进了精确的增强子识别和强度预测,从而有助于理解增强子的功能及其在基因组学上的关联。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验