• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用Transformer模型对人类细胞系中的超级增强子进行仅序列预测。

Sequence-Only Prediction of Super-Enhancers in Human Cell Lines Using Transformer Models.

作者信息

Kravchuk Ekaterina V, Ashniev German A, Gladkova Marina G, Orlov Alexey V, Zaitseva Zoia G, Malkerov Juri A, Orlova Natalia N

机构信息

Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia.

Faculty of Biology, Lomonosov Moscow State University, Leninskiye Gory, MSU, 1-12, 119991 Moscow, Russia.

出版信息

Biology (Basel). 2025 Feb 7;14(2):172. doi: 10.3390/biology14020172.

DOI:10.3390/biology14020172
PMID:40001940
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11852244/
Abstract

The study discloses the application of transformer-based deep learning models for the task of super-enhancers prediction in human tumor cell lines with a specific focus on sequence-only features within studied entities of super-enhancer and enhancer elements in the human genome. The proposed SE-prediction method included the GENA-LM application at handling long DNA sequences with the classification task, distinguishing super-enhancers from enhancers using H3K36me, H3K4me1, H3K4me3 and H3K27ac landscape datasets from HeLa, HEK293, H2171, Jurkat, K562, MM1S and U87 cell lines. The model was fine-tuned on relevant sequence data, allowing for the analysis of extended genomic sequences without the need for epigenetic markers as proposed in early approaches. The study achieved balanced accuracy metrics, surpassing previous models like SENet, particularly in HEK293 and K562 cell lines. Also, it was shown that super-enhancers frequently co-localize with epigenetic marks such as H3K4me3 and H3K27ac. Therefore, the attention mechanism of the model provided insights into the sequence features contributing to SE classification, indicating a correlation between sequence-only features and mentioned epigenetic landscapes. These findings support the potential transformer models use in further genomic sequence analysis for bioinformatics applications in enhancer/super-enhancer characterization and gene regulation studies.

摘要

该研究揭示了基于Transformer的深度学习模型在人类肿瘤细胞系超级增强子预测任务中的应用,特别关注人类基因组中超级增强子和增强子元件所研究实体中的仅序列特征。所提出的SE预测方法包括将GENA-LM应用于处理具有分类任务的长DNA序列,使用来自HeLa、HEK293、H2171、Jurkat、K562、MM1S和U87细胞系的H3K36me、H3K4me1、H3K4me3和H3K27ac表观遗传景观数据集区分超级增强子和增强子。该模型在相关序列数据上进行了微调,无需早期方法中提出的表观遗传标记即可分析扩展的基因组序列。该研究实现了平衡的准确率指标,超过了之前的模型如SENet,特别是在HEK293和K562细胞系中。此外,研究表明超级增强子经常与H3K4me3和H3K27ac等表观遗传标记共定位。因此,模型的注意力机制为有助于SE分类的序列特征提供了见解,表明仅序列特征与上述表观遗传景观之间存在相关性。这些发现支持了潜在的Transformer模型在进一步的基因组序列分析中用于增强子/超级增强子表征和基因调控研究的生物信息学应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/16f1279382bf/biology-14-00172-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/bc674a38e87d/biology-14-00172-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/56e2b7ff4143/biology-14-00172-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/22ffdd4e144b/biology-14-00172-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/793d7a40672a/biology-14-00172-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/ced362ddea53/biology-14-00172-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/91d7a70ca940/biology-14-00172-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/16f1279382bf/biology-14-00172-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/bc674a38e87d/biology-14-00172-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/56e2b7ff4143/biology-14-00172-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/22ffdd4e144b/biology-14-00172-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/793d7a40672a/biology-14-00172-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/ced362ddea53/biology-14-00172-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/91d7a70ca940/biology-14-00172-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0618/11852244/16f1279382bf/biology-14-00172-g007.jpg

相似文献

1
Sequence-Only Prediction of Super-Enhancers in Human Cell Lines Using Transformer Models.使用Transformer模型对人类细胞系中的超级增强子进行仅序列预测。
Biology (Basel). 2025 Feb 7;14(2):172. doi: 10.3390/biology14020172.
2
SENet: A deep learning framework for discriminating super- and typical enhancers by sequence information.SENet:一种基于序列信息区分超级增强子和典型增强子的深度学习框架。
Comput Biol Chem. 2023 Aug;105:107905. doi: 10.1016/j.compbiolchem.2023.107905. Epub 2023 Jun 11.
3
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱:一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.
4
DNA methylation regulates discrimination of enhancers from promoters through a H3K4me1-H3K4me3 seesaw mechanism.DNA 甲基化通过 H3K4me1-H3K4me3 跷跷板机制调节增强子与启动子的区分。
BMC Genomics. 2017 Dec 12;18(1):964. doi: 10.1186/s12864-017-4353-7.
5
Chromatin interaction networks revealed unique connectivity patterns of broad H3K4me3 domains and super enhancers in 3D chromatin.染色质相互作用网络揭示了广泛的 H3K4me3 结构域和超级增强子在 3D 染色质中的独特连接模式。
Sci Rep. 2017 Oct 31;7(1):14466. doi: 10.1038/s41598-017-14389-7.
6
Utilizing a deep learning model based on BERT for identifying enhancers and their strength.利用基于BERT的深度学习模型来识别增强子及其强度。
PLoS One. 2025 Apr 9;20(4):e0320085. doi: 10.1371/journal.pone.0320085. eCollection 2025.
7
The hyper-activation of transcriptional enhancers in breast cancer.乳腺癌中转录增强子的过度激活。
Clin Epigenetics. 2019 Mar 12;11(1):48. doi: 10.1186/s13148-019-0645-x.
8
The super-enhancer repertoire in porcine liver.猪肝脏中的超级增强子库。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad056.
9
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
10
Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework.基于堆叠多元融合框架的全基因组 DNA 增强子识别与特征分析。
PLoS Comput Biol. 2022 Dec 15;18(12):e1010779. doi: 10.1371/journal.pcbi.1010779. eCollection 2022 Dec.

本文引用的文献

1
GENA-LM: a family of open-source foundational DNA language models for long sequences.GENA-LM:用于长序列的开源基础DNA语言模型家族。
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkae1310.
2
ChIP-Atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data.ChIP-Atlas 3.0:一个数据挖掘套件,用于探索染色体结构以及大规模调控组数据。
Nucleic Acids Res. 2024 Jul 5;52(W1):W45-W53. doi: 10.1093/nar/gkae358.
3
Super-Enhancers and Their Parts: From Prediction Efforts to Pathognomonic Status.
超级增强子及其组成部分:从预测努力到特征状态。
Int J Mol Sci. 2024 Mar 7;25(6):3103. doi: 10.3390/ijms25063103.
4
H3K4me1 facilitates promoter-enhancer interactions and gene activation during embryonic stem cell differentiation.H3K4me1 促进胚胎干细胞分化过程中的启动子-增强子相互作用和基因激活。
Mol Cell. 2024 May 2;84(9):1742-1752.e5. doi: 10.1016/j.molcel.2024.02.030. Epub 2024 Mar 20.
5
Hypertensive Pressure Mechanosensing Alone Triggers Lipid Droplet Accumulation and Transdifferentiation of Vascular Smooth Muscle Cells to Foam Cells.单纯高血压压力机械感受器触发脂质滴积累和血管平滑肌细胞向泡沫细胞的转分化。
Adv Sci (Weinh). 2024 Mar;11(9):e2308686. doi: 10.1002/advs.202308686. Epub 2023 Dec 25.
6
A lightweight transformer for faster and robust EBSD data collection.一种用于更快、更稳健地收集电子背散射衍射(EBSD)数据的轻量级变压器。
Sci Rep. 2023 Dec 1;13(1):21253. doi: 10.1038/s41598-023-47936-6.
7
SENet: A deep learning framework for discriminating super- and typical enhancers by sequence information.SENet:一种基于序列信息区分超级增强子和典型增强子的深度学习框架。
Comput Biol Chem. 2023 Aug;105:107905. doi: 10.1016/j.compbiolchem.2023.107905. Epub 2023 Jun 11.
8
Experimental Validation and Prediction of Super-Enhancers: Advances and Challenges.实验验证和超级增强子预测:进展与挑战。
Cells. 2023 Apr 19;12(8):1191. doi: 10.3390/cells12081191.
9
Analysis of super-enhancer using machine learning and its application to medical biology.基于机器学习的超级增强子分析及其在医学生物学中的应用。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad107.
10
H3K4me3 regulates RNA polymerase II promoter-proximal pause-release.H3K4me3 调控 RNA 聚合酶 II 启动子近端暂停释放。
Nature. 2023 Mar;615(7951):339-348. doi: 10.1038/s41586-023-05780-8. Epub 2023 Mar 1.