• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PLPMpro:基于预训练语言模型的提示学习增强启动子序列预测。

PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model.

机构信息

School of Software, Shandong University, Jinan 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.

School of Software, Shandong University, Jinan 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.

出版信息

Comput Biol Med. 2023 Sep;164:107260. doi: 10.1016/j.compbiomed.2023.107260. Epub 2023 Jul 21.

DOI:10.1016/j.compbiomed.2023.107260
PMID:37557052
Abstract

The promoter region, positioned proximal to the transcription start sites, exerts control over the initiation of gene transcription by modulating the interaction with RNA polymerase. Consequently, the accurate recognition of promoter regions represents a critical focus within the bioinformatics domain. Although some methods leveraging pre-trained language models (PLMs) for promoter prediction have been proposed, the full potential of such PLMs remains largely untapped. In this study, we introduce PLPMpro, a model that capitalizes on prompt-learning and the pre-trained language model to enhance the prediction of promoter sequences. PLPMpro effectively harnesses the prompt learning paradigm to fully exploit the inherent capacities of the PLM, resulting in substantial improvements in prediction performance. Experiment results unequivocally demonstrate the efficacy of prompt learning in bolstering the capabilities of the pre-trained model. Consequently, PLPMpro surpasses both typical pre-trained model-based methods for promoter prediction and typical deep learning methods. Furthermore, we conduct various experiments to meticulously scrutinize the effects of different prompt learning settings and different numbers of soft modules on the model performance. More importantly, the interpretation experiment reveals that the pre-trained model captures biological semantics. Collectively, this research imparts a novel perspective on the optimal utilization of PLMs for addressing biological problems.

摘要

启动子区域位于转录起始位点的近端,通过调节与 RNA 聚合酶的相互作用来控制基因转录的起始。因此,准确识别启动子区域是生物信息学领域的一个关键焦点。尽管已经提出了一些利用预训练语言模型 (PLM) 进行启动子预测的方法,但这些 PLM 的全部潜力在很大程度上尚未得到开发。在这项研究中,我们引入了 PLPMpro,这是一种利用提示学习和预训练语言模型来增强启动子序列预测的模型。PLPMpro 有效地利用了提示学习范例,充分利用了 PLM 的固有能力,从而显著提高了预测性能。实验结果毫不含糊地证明了提示学习在增强预训练模型能力方面的有效性。因此,PLPMpro 优于典型的基于预训练模型的启动子预测方法和典型的深度学习方法。此外,我们进行了各种实验来仔细研究不同的提示学习设置和不同数量的软模块对模型性能的影响。更重要的是,解释实验表明,预训练模型捕获了生物学语义。总的来说,这项研究为利用 PLM 解决生物学问题提供了新的视角。

相似文献

1
PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model.PLPMpro:基于预训练语言模型的提示学习增强启动子序列预测。
Comput Biol Med. 2023 Sep;164:107260. doi: 10.1016/j.compbiomed.2023.107260. Epub 2023 Jul 21.
2
iProL: identifying DNA promoters from sequence information based on Longformer pre-trained model.iProL:基于 Longformer 预训练模型从序列信息中识别 DNA 启动子。
BMC Bioinformatics. 2024 Jun 25;25(1):224. doi: 10.1186/s12859-024-05849-9.
3
Toward a stable and low-resource PLM-based medical diagnostic system via prompt tuning and MoE structure.通过prompt 调优和 MoE 结构实现稳定且资源消耗低的基于预训练语言模型的医学诊断系统。
Sci Rep. 2023 Aug 3;13(1):12595. doi: 10.1038/s41598-023-39543-2.
4
msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.msBERT-Promoter:一种基于 BERT 预训练模型的多尺度集成预测器,用于 DNA 启动子及其强度的两阶段预测。
BMC Biol. 2024 May 30;22(1):126. doi: 10.1186/s12915-024-01923-z.
5
PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest.启动子预测-MF(2L):一种基于多源特征融合和深度森林的新型启动子预测方法。
Interdiscip Sci. 2022 Sep;14(3):697-711. doi: 10.1007/s12539-022-00520-4. Epub 2022 Apr 30.
6
Chemical-Protein Relation Extraction with Pre-trained Prompt Tuning.基于预训练提示调整的化学-蛋白质关系提取
Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:608-609. doi: 10.1109/ichi54592.2022.00120. Epub 2022 Sep 8.
7
PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence.基于图嵌入和集成学习的真核序列启动子预测
Genes (Basel). 2023 Jul 13;14(7):1441. doi: 10.3390/genes14071441.
8
HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing.健康提示:一种临床自然语言处理的零样本学习范式。
AMIA Annu Symp Proc. 2023 Apr 29;2022:972-981. eCollection 2022.
9
BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection.BERT-启动子:一种使用BERT预训练模型和SHAP特征选择的基于序列的DNA启动子改进预测器。
Comput Biol Chem. 2022 Aug;99:107732. doi: 10.1016/j.compbiolchem.2022.107732. Epub 2022 Jul 14.
10
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.使用卷积深度学习神经网络识别原核生物和真核生物启动子。
PLoS One. 2017 Feb 3;12(2):e0171410. doi: 10.1371/journal.pone.0171410. eCollection 2017.

引用本文的文献

1
PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features.PLM-ATG:通过将蛋白质语言模型嵌入与基于位置特异性得分矩阵的特征相结合来鉴定自噬蛋白
Molecules. 2025 Apr 10;30(8):1704. doi: 10.3390/molecules30081704.
2
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.DNA序列分析全景:对DNA序列分析任务类型、数据库、数据集、词嵌入方法和语言模型的全面综述。
Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.
3
StackER: a novel SMILES-based stacked approach for the accelerated and efficient discovery of ERα and ERβ antagonists.
StackER:一种基于 SMILES 的新型堆叠方法,用于加速和高效发现 ERα 和 ERβ 拮抗剂。
Sci Rep. 2023 Dec 27;13(1):22994. doi: 10.1038/s41598-023-50393-w.