• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于特征提取的离线提示强化学习方法

Offline prompt reinforcement learning method based on feature extraction.

作者信息

Yao Tianlei, Chen Xiliang, Yao Yi, Huang Weiye, Chen Zhaoyang

机构信息

College of Command and Control Engineering, Army Engineering University of PLA, Nanjing, China.

出版信息

PeerJ Comput Sci. 2025 Jan 2;11:e2490. doi: 10.7717/peerj-cs.2490. eCollection 2025.

DOI:10.7717/peerj-cs.2490
PMID:39896020
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784719/
Abstract

Recent studies have shown that combining Transformer and conditional strategies to deal with offline reinforcement learning can bring better results. However, in a conventional reinforcement learning scenario, the agent can receive a single frame of observations one by one according to its natural chronological sequence, but in Transformer, a series of observations are received at each step. Individual features cannot be extracted efficiently to make more accurate decisions, and it is still difficult to generalize effectively for data outside the distribution. We focus on the characteristic of few-shot learning in pre-trained models, and combine prompt learning to enhance the ability of real-time policy adjustment. By sampling the specific information in the offline dataset as trajectory samples, the task information is encoded to help the pre-trained model quickly understand the task characteristics and the sequence generation paradigm to quickly adapt to the downstream tasks. In order to understand the dependencies in the sequence more accurately, we also divide the fixed-size state information blocks in the input trajectory, extract the features of the segmented sub-blocks respectively, and finally encode the whole sequence into the GPT model to generate decisions more accurately. Experiments show that the proposed method achieves better performance than the baseline method in related tasks, can be generalized to new environments and tasks better, and effectively improves the stability and accuracy of agent decision making.

摘要

最近的研究表明,将Transformer和条件策略相结合来处理离线强化学习可以带来更好的结果。然而,在传统的强化学习场景中,智能体可以按照其自然时间顺序逐个接收单帧观测,但在Transformer中,每一步会接收一系列观测。无法有效地提取个体特征以做出更准确的决策,并且对于分布外的数据仍然难以有效地进行泛化。我们关注预训练模型中的少样本学习特性,并结合提示学习来增强实时策略调整的能力。通过在离线数据集中采样特定信息作为轨迹样本,对任务信息进行编码,以帮助预训练模型快速理解任务特征和序列生成范式,从而快速适应下游任务。为了更准确地理解序列中的依赖关系,我们还对输入轨迹中的固定大小状态信息块进行划分,分别提取分段子块的特征,最后将整个序列编码到GPT模型中以更准确地生成决策。实验表明,所提出的方法在相关任务中比基线方法取得了更好的性能,能够更好地泛化到新环境和任务中,并有效地提高了智能体决策的稳定性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/ce3719b340bf/peerj-cs-11-2490-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/11a6a04b58cc/peerj-cs-11-2490-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/98d0987f4e05/peerj-cs-11-2490-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/014687bfa2bc/peerj-cs-11-2490-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/269c5b832f00/peerj-cs-11-2490-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/d7ae3ab55797/peerj-cs-11-2490-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/c7082779ead2/peerj-cs-11-2490-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/c2f5aecb98c6/peerj-cs-11-2490-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/005de6aa1c61/peerj-cs-11-2490-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/293db57aa5c6/peerj-cs-11-2490-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/83a9a97b512f/peerj-cs-11-2490-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/ddc8664b3776/peerj-cs-11-2490-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/38e42fbae85f/peerj-cs-11-2490-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/baf33cd7396c/peerj-cs-11-2490-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/a0e375041dd8/peerj-cs-11-2490-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/eca3979033bd/peerj-cs-11-2490-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/ce3719b340bf/peerj-cs-11-2490-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/11a6a04b58cc/peerj-cs-11-2490-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/98d0987f4e05/peerj-cs-11-2490-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/014687bfa2bc/peerj-cs-11-2490-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/269c5b832f00/peerj-cs-11-2490-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/d7ae3ab55797/peerj-cs-11-2490-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/c7082779ead2/peerj-cs-11-2490-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/c2f5aecb98c6/peerj-cs-11-2490-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/005de6aa1c61/peerj-cs-11-2490-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/293db57aa5c6/peerj-cs-11-2490-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/83a9a97b512f/peerj-cs-11-2490-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/ddc8664b3776/peerj-cs-11-2490-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/38e42fbae85f/peerj-cs-11-2490-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/baf33cd7396c/peerj-cs-11-2490-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/a0e375041dd8/peerj-cs-11-2490-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/eca3979033bd/peerj-cs-11-2490-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/ce3719b340bf/peerj-cs-11-2490-g016.jpg

相似文献

1
Offline prompt reinforcement learning method based on feature extraction.基于特征提取的离线提示强化学习方法
PeerJ Comput Sci. 2025 Jan 2;11:e2490. doi: 10.7717/peerj-cs.2490. eCollection 2025.
2
Offline reinforcement learning combining generalized advantage estimation and modality decomposition interaction.结合广义优势估计和模态分解交互的离线强化学习
Sci Rep. 2025 May 4;15(1):15601. doi: 10.1038/s41598-025-98572-1.
3
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
4
SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction.敏感癌症GPT:利用生成式大语言模型处理结构化组学数据以优化药物敏感性预测。
bioRxiv. 2025 Mar 3:2025.02.27.640661. doi: 10.1101/2025.02.27.640661.
5
Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions.基于离线模型的可适应策略学习,用于支持区域外的决策制定。
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15260-15274. doi: 10.1109/TPAMI.2023.3317131. Epub 2023 Nov 3.
6
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
7
Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design.基于生成式预训练转换器(GPT)的相对注意力模型在从头设计药物中的应用。
Comput Biol Chem. 2023 Oct;106:107911. doi: 10.1016/j.compbiolchem.2023.107911. Epub 2023 Jun 29.
8
PaCAR: COVID-19 Pandemic Control Decision Making via Large-Scale Agent-Based Modeling and Deep Reinforcement Learning.PaCAR:通过大规模基于代理的建模和深度强化学习进行 COVID-19 大流行控制决策。
Med Decis Making. 2022 Nov;42(8):1064-1077. doi: 10.1177/0272989X221107902. Epub 2022 Jul 1.
9
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
10
Multi-hop interpretable meta learning for few-shot temporal knowledge graph completion.用于少样本时态知识图谱补全的多跳可解释元学习
Neural Netw. 2025 Mar;183:106981. doi: 10.1016/j.neunet.2024.106981. Epub 2024 Nov 28.

本文引用的文献

1
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.离线强化学习综述:分类、回顾与开放问题
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10237-10257. doi: 10.1109/TNNLS.2023.3250269. Epub 2024 Aug 5.
2
StARformer: Transformer With State-Action-Reward Representations for Robot Learning.StARformer:用于机器人学习的具有状态-动作-奖励表示的Transformer
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12862-12877. doi: 10.1109/TPAMI.2022.3204708. Epub 2023 Oct 3.
3
ASI-DBNet: An Adaptive Sparse Interactive ResNet-Vision Transformer Dual-Branch Network for the Grading of Brain Cancer Histopathological Images.
ASI-DBNet:一种用于脑癌组织病理学图像分级的自适应稀疏交互式残差网络-视觉Transformer双分支网络
Interdiscip Sci. 2023 Mar;15(1):15-31. doi: 10.1007/s12539-022-00532-0. Epub 2022 Jul 9.
4
AMMU: A survey of transformer-based biomedical pretrained language models.基于变压器的生物医学预训练语言模型综述。
J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31.