Suppr超能文献

基于特征提取的离线提示强化学习方法

Offline prompt reinforcement learning method based on feature extraction.

作者信息

Yao Tianlei, Chen Xiliang, Yao Yi, Huang Weiye, Chen Zhaoyang

机构信息

College of Command and Control Engineering, Army Engineering University of PLA, Nanjing, China.

出版信息

PeerJ Comput Sci. 2025 Jan 2;11:e2490. doi: 10.7717/peerj-cs.2490. eCollection 2025.

Abstract

Recent studies have shown that combining Transformer and conditional strategies to deal with offline reinforcement learning can bring better results. However, in a conventional reinforcement learning scenario, the agent can receive a single frame of observations one by one according to its natural chronological sequence, but in Transformer, a series of observations are received at each step. Individual features cannot be extracted efficiently to make more accurate decisions, and it is still difficult to generalize effectively for data outside the distribution. We focus on the characteristic of few-shot learning in pre-trained models, and combine prompt learning to enhance the ability of real-time policy adjustment. By sampling the specific information in the offline dataset as trajectory samples, the task information is encoded to help the pre-trained model quickly understand the task characteristics and the sequence generation paradigm to quickly adapt to the downstream tasks. In order to understand the dependencies in the sequence more accurately, we also divide the fixed-size state information blocks in the input trajectory, extract the features of the segmented sub-blocks respectively, and finally encode the whole sequence into the GPT model to generate decisions more accurately. Experiments show that the proposed method achieves better performance than the baseline method in related tasks, can be generalized to new environments and tasks better, and effectively improves the stability and accuracy of agent decision making.

摘要

最近的研究表明,将Transformer和条件策略相结合来处理离线强化学习可以带来更好的结果。然而,在传统的强化学习场景中,智能体可以按照其自然时间顺序逐个接收单帧观测,但在Transformer中,每一步会接收一系列观测。无法有效地提取个体特征以做出更准确的决策,并且对于分布外的数据仍然难以有效地进行泛化。我们关注预训练模型中的少样本学习特性,并结合提示学习来增强实时策略调整的能力。通过在离线数据集中采样特定信息作为轨迹样本,对任务信息进行编码,以帮助预训练模型快速理解任务特征和序列生成范式,从而快速适应下游任务。为了更准确地理解序列中的依赖关系,我们还对输入轨迹中的固定大小状态信息块进行划分,分别提取分段子块的特征,最后将整个序列编码到GPT模型中以更准确地生成决策。实验表明,所提出的方法在相关任务中比基线方法取得了更好的性能,能够更好地泛化到新环境和任务中,并有效地提高了智能体决策的稳定性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8858/11784719/11a6a04b58cc/peerj-cs-11-2490-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验