School of Computer Sciences, University of South China, Hengyang 421001, China.
School of Computer Sciences, University of South China, Hengyang 421001, China.
Comput Biol Chem. 2023 Aug;105:107905. doi: 10.1016/j.compbiolchem.2023.107905. Epub 2023 Jun 11.
Super-enhancers are large domains on the genome where multiple short typical enhancers within a specific genomic distance are stitched together. Typically, they are cell type-specific and responsible for defining cell identity and regulating gene transcription. Numerous studies have demonstrated that super-enhancers are enriched for trait-associated variants, and mutations in super-enhancers are possibly related to known diseases. Recently, several machine learning-based methods have been used to distinguish super-enhancers from typical enhancers by using high-throughput data from various experimental methods. The acquisition of such experimental data is usually costly and time-consuming. In this paper, we innovatively proposed SENet, a groundbreaking method based on a deep neural network model, for discriminating between the two categories solely utilizing sequence information. SENet employs dna2vec feature embedding, convolution for local feature extraction, attention pooling for refined feature retention, and Transformer for contextual information extraction. Experiments demonstrate that SENet outperforms all current state-of-the-art computational methods and shows satisfactory performance in cross-species validation. Our method pioneers the distinction between super-enhancers and typical ones using only sequence information. The source code and datasets are stored in https://github.com/lhy0322/SENet.
超级增强子是基因组上的大区域,其中多个短的典型增强子在特定的基因组距离内缝合在一起。它们通常是细胞类型特异性的,负责定义细胞身份和调节基因转录。许多研究表明,超级增强子富含与特征相关的变异,超级增强子中的突变可能与已知疾病有关。最近,一些基于机器学习的方法已经被用于通过使用来自各种实验方法的高通量数据来区分超级增强子和典型增强子。获取此类实验数据通常既昂贵又耗时。在本文中,我们创新性地提出了 SENet,这是一种基于深度神经网络模型的开创性方法,仅利用序列信息就能对这两类进行区分。SENet 采用了 dna2vec 特征嵌入、卷积进行局部特征提取、注意力池化进行精炼特征保留以及 Transformer 进行上下文信息提取。实验表明,SENet 优于所有现有的最先进的计算方法,并在跨物种验证中表现出令人满意的性能。我们的方法开创了仅使用序列信息来区分超级增强子和典型增强子的先河。源代码和数据集存储在 https://github.com/lhy0322/SENet。