College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.
Genomics. 2021 Nov;113(6):4052-4060. doi: 10.1016/j.ygeno.2021.10.007. Epub 2021 Oct 16.
Super-enhancer (SE) is a cluster of active typical enhancers (TE) with high levels of the Mediator complex, master transcriptional factors, and chromatin regulators. SEs play a key role in the control of cell identity and disease. Traditionally, scientists used a variety of high-throughput data of different transcriptional factors or chromatin marks to distinguish SEs from TEs. This kind of experimental methods are usually costly and time-consuming. In this paper, we proposed a model DeepSE, which is based on a deep convolutional neural network model, to distinguish the SEs from TEs. DeepSE represent the DNA sequences using the dna2vec feature embeddings. With only the DNA sequence information, DeepSE outperformed all state-of-the-art methods. In addition, DeepSE can be generalized well across different cell lines, which implied that cell-type specific SEs may share hidden sequence patterns across different cell lines. The source code and data are stored in GitHub (https://github.com/QiaoyingJi/DeepSE).
超级增强子 (SE) 是一组具有高 Mediator 复合物、主转录因子和染色质调节因子水平的活跃典型增强子 (TE)。SE 在控制细胞身份和疾病方面发挥着关键作用。传统上,科学家们使用各种不同转录因子或染色质标记的高通量数据来区分 SE 和 TE。这种实验方法通常成本高昂且耗时。在本文中,我们提出了一种基于深度卷积神经网络模型的模型 DeepSE,用于区分 SE 和 TE。DeepSE 使用 dna2vec 特征嵌入来表示 DNA 序列。仅使用 DNA 序列信息,DeepSE 的表现优于所有最先进的方法。此外,DeepSE 可以很好地跨不同细胞系泛化,这表明细胞类型特异性 SE 可能在不同细胞系中共享隐藏的序列模式。源代码和数据存储在 GitHub(https://github.com/QiaoyingJi/DeepSE)上。