• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过人在回路强化学习实现精确且灵活的机器人操作。

Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning.

作者信息

Luo Jianlan, Xu Charles, Wu Jeffrey, Levine Sergey

机构信息

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.

出版信息

Sci Robot. 2025 Aug 20;10(105):eads5033. doi: 10.1126/scirobotics.ads5033.

DOI:10.1126/scirobotics.ads5033
PMID:40834062
Abstract

Robotic manipulation remains one of the most difficult challenges in robotics, with approaches ranging from classical model-based control to modern imitation learning. Although these methods have enabled substantial progress, they often require extensive manual design, struggle with performance, and demand large-scale data collection. These limitations hinder their real-world deployment at scale, where reliability, speed, and robustness are essential. Reinforcement learning (RL) offers a powerful alternative by enabling robots to autonomously acquire complex manipulation skills through interaction. However, realizing the full potential of RL in the real world remains challenging because of issues of sample efficiency and safety. We present a human-in-the-loop, vision-based RL system that achieved strong performance on a wide range of dexterous manipulation tasks, including precise assembly, dynamic manipulation, and dual-arm coordination. These tasks reflect realistic industrial tolerances, with small but critical variations in initial object placements that demand sophisticated reactive control. Our method integrates demonstrations, human corrections, sample-efficient RL algorithms, and system-level design to directly learn RL policies in the real world. Within 1 to 2.5 hours of real-world training, our approach outperformed other baselines by improving task success by 2×, achieving near-perfect success rates, and executing 1.8× faster on average. Through extensive experiments and analysis, our results suggest that RL can learn a wide range of complex vision-based manipulation policies directly in the real world within practical training times. We hope that this work will inspire a new generation of learned robotic manipulation techniques, benefiting both industrial applications and research advancements.

摘要

机器人操作仍然是机器人技术中最具挑战性的难题之一,其方法涵盖从基于经典模型的控制到现代模仿学习。尽管这些方法取得了显著进展,但它们通常需要大量的人工设计,在性能方面存在困难,并且需要大规模的数据收集。这些限制阻碍了它们在现实世界中的大规模部署,而在现实世界中,可靠性、速度和鲁棒性至关重要。强化学习(RL)提供了一种强大的替代方案,通过使机器人能够通过交互自主获取复杂的操作技能。然而,由于样本效率和安全问题,在现实世界中充分发挥强化学习的潜力仍然具有挑战性。我们提出了一种基于视觉的人在回路强化学习系统,该系统在广泛的灵巧操作任务上取得了优异的性能,包括精确装配、动态操作和双臂协调。这些任务反映了现实的工业公差,初始物体放置存在微小但关键的变化,需要复杂的反应控制。我们的方法集成了示范、人工校正、样本高效的强化学习算法和系统级设计,以在现实世界中直接学习强化学习策略。在1到2.5小时的现实世界训练中,我们的方法通过将任务成功率提高2倍、实现近乎完美的成功率以及平均执行速度提高1.8倍,超过了其他基线方法。通过广泛的实验和分析,我们的结果表明,强化学习可以在实际训练时间内在现实世界中直接学习各种复杂的基于视觉的操作策略。我们希望这项工作将激发新一代的学习型机器人操作技术,造福工业应用和研究进展。

相似文献

1
Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning.通过人在回路强化学习实现精确且灵活的机器人操作。
Sci Robot. 2025 Aug 20;10(105):eads5033. doi: 10.1126/scirobotics.ads5033.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作:定性证据综合评价。
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.
5
Learning contact-rich whole-body manipulation with example-guided reinforcement learning.通过示例引导的强化学习学习富含接触的全身操作。
Sci Robot. 2025 Aug 20;10(105):eads6790. doi: 10.1126/scirobotics.ads6790.
6
"In a State of Flow": A Qualitative Examination of Autistic Adults' Phenomenological Experiences of Task Immersion.“心流状态”:对自闭症成年人任务沉浸现象学体验的质性研究
Autism Adulthood. 2024 Sep 16;6(3):362-373. doi: 10.1089/aut.2023.0032. eCollection 2024 Sep.
7
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
10
Sexual Harassment and Prevention Training性骚扰与预防培训

引用本文的文献

1
Design of a PEBA-Silicone Composite Magneto-Sensitive Airbag Sensor for Simultaneous Contact Force and Motion Detection.用于同时检测接触力和运动的聚醚嵌段酰胺-硅酮复合磁敏安全气囊传感器的设计
Sensors (Basel). 2025 Sep 18;25(18):5823. doi: 10.3390/s25185823.