C2RL：基于自预训练的强化学习的卷积对比学习，用于强增强。

C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation.

机构信息

Graduate School of Automotive Engineering, Kookmin University, Seoul 02707, Republic of Korea.

Department of Electrical Engineering, Pusan National University, Busan 46241, Republic of Korea.

出版信息

Sensors (Basel). 2023 May 21;23(10):4946. doi: 10.3390/s23104946.

DOI:10.3390/s23104946

PMID:37430860

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10222541/

Abstract

Reinforcement learning agents that have not been seen during training must be robust in test environments. However, the generalization problem is challenging to solve in reinforcement learning using high-dimensional images as the input. The addition of a self-supervised learning framework with data augmentation in the reinforcement learning architecture can promote generalization to a certain extent. However, excessively large changes in the input images may disturb reinforcement learning. Therefore, we propose a contrastive learning method that can help manage the trade-off relationship between the performance of reinforcement learning and auxiliary tasks against the data augmentation strength. In this framework, strong augmentation does not disturb reinforcement learning and instead maximizes the auxiliary effect for generalization. Results of experiments on the DeepMind Control suite demonstrate that the proposed method effectively uses strong data augmentation and achieves a higher generalization than the existing methods.

摘要

在训练中未见过的强化学习代理在测试环境中必须具有鲁棒性。然而，在使用高维图像作为输入的强化学习中，泛化问题是一个具有挑战性的难题。在强化学习架构中添加带有数据增强的自监督学习框架可以在一定程度上促进泛化。然而，输入图像的过大变化可能会干扰强化学习。因此，我们提出了一种对比学习方法，可以帮助管理强化学习性能和辅助任务与数据增强强度之间的权衡关系。在这个框架中，强增强不会干扰强化学习，反而会最大限度地提高辅助效果以实现泛化。在 DeepMind Control 套件上的实验结果表明，所提出的方法有效地利用了强数据增强，并实现了比现有方法更高的泛化能力。