• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

StARformer:用于机器人学习的具有状态-动作-奖励表示的Transformer

StARformer: Transformer With State-Action-Reward Representations for Robot Learning.

作者信息

Shang Jinghuan, Li Xiang, Kahatapitiya Kumara, Lee Yu-Cheol, Ryoo Michael S

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12862-12877. doi: 10.1109/TPAMI.2022.3204708. Epub 2023 Oct 3.

DOI:10.1109/TPAMI.2022.3204708
PMID:36067106
Abstract

Reinforcement Learning (RL) can be considered as a sequence modeling task, where an agent employs a sequence of past state-action-reward experiences to predict a sequence of future actions. In this work, we propose State-Action-Reward Transformer (StARformer), a Transformer architecture for robot learning with image inputs, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. StARformer first extracts StAR-representations using self-attending patches of image states, action, and reward tokens within a short temporal window. These StAR-representations are combined with pure image state representations, extracted as convolutional features, to perform self-attention over the whole sequence. Our experimental results show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, under both offline-RL and imitation learning settings. We find that models can benefit from our combination of patch-wise and convolutional image embeddings. StARformer is also more compliant with longer sequences of inputs than the baseline method. Finally, we demonstrate how StARformer can be successfully applied to a real-world robot imitation learning setting via a human-following task.

摘要

强化学习(RL)可以被视为一种序列建模任务,其中智能体利用过去的状态-动作-奖励经验序列来预测未来的动作序列。在这项工作中,我们提出了状态-动作-奖励变换器(StARformer),这是一种用于机器人图像输入学习的变换器架构,它明确地对短期状态-动作-奖励表示(StAR表示)进行建模,本质上引入了一种类似马尔可夫的归纳偏差来改进长期建模。StARformer首先在短时间窗口内使用图像状态、动作和奖励令牌的自注意力补丁来提取StAR表示。这些StAR表示与作为卷积特征提取的纯图像状态表示相结合,以对整个序列执行自注意力。我们的实验结果表明,在离线强化学习和模仿学习设置下,StARformer在基于图像的雅达利游戏和深度思维控制套件基准测试中优于基于变换器的现有方法。我们发现模型可以从我们的逐补丁和卷积图像嵌入的组合中受益。与基线方法相比,StARformer也更能适应更长的输入序列。最后,我们展示了StARformer如何通过跟随人类任务成功应用于现实世界的机器人模仿学习设置。

相似文献

1
StARformer: Transformer With State-Action-Reward Representations for Robot Learning.StARformer:用于机器人学习的具有状态-动作-奖励表示的Transformer
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12862-12877. doi: 10.1109/TPAMI.2022.3204708. Epub 2023 Oct 3.
2
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.STACoRe:用于雅达利强化学习的基于时空和动作对比的表示方法。
Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29.
3
Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning.用于数据高效强化学习的掩码与逆动力学建模
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8814-8827. doi: 10.1109/TNNLS.2024.3439261. Epub 2025 May 2.
4
Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction.基于在线状态抽象和因果变压器模型预测的样本高效深度强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16574-16588. doi: 10.1109/TNNLS.2023.3296642. Epub 2024 Oct 29.
5
A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.基于 Transformer 的混合在线非策略强化学习代理框架。
Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.
6
Masked Contrastive Representation Learning for Reinforcement Learning.用于强化学习的掩码对比表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3421-3433. doi: 10.1109/TPAMI.2022.3176413. Epub 2023 Feb 3.
7
MAE-TransRNet: An improved transformer-ConvNet architecture with masked autoencoder for cardiac MRI registration.MAE-TransRNet:一种用于心脏磁共振成像配准的、带有掩码自动编码器的改进型Transformer-ConvNet架构。
Front Med (Lausanne). 2023 Mar 9;10:1114571. doi: 10.3389/fmed.2023.1114571. eCollection 2023.
8
MMNet: A Mixing Module Network for Polyp Segmentation.MMNet:一种用于息肉分割的混合模块网络。
Sensors (Basel). 2023 Aug 18;23(16):7258. doi: 10.3390/s23167258.
9
Medical Image Segmentation Using Transformer Networks.使用Transformer网络的医学图像分割
IEEE Access. 2022;10:29322-29332. doi: 10.1109/access.2022.3156894. Epub 2022 Mar 4.
10
A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.强化学习算法通过划分任务空间从训练代理那里获取演示。
Neural Netw. 2023 Jul;164:419-427. doi: 10.1016/j.neunet.2023.04.042. Epub 2023 May 5.

引用本文的文献

1
Offline prompt reinforcement learning method based on feature extraction.基于特征提取的离线提示强化学习方法
PeerJ Comput Sci. 2025 Jan 2;11:e2490. doi: 10.7717/peerj-cs.2490. eCollection 2025.