Sun Haibo, Zhu Feng, Li Yangyang, Zhao Pengfei, Kong Yanzi, Wang Jianyu, Wan Yingcai, Fu Shuangfei
Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China.
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang, China.
Front Neurorobot. 2023 Feb 24;17:1093132. doi: 10.3389/fnbot.2023.1093132. eCollection 2023.
Active object recognition (AOR) provides a paradigm where an agent can capture additional evidence by purposefully changing its viewpoint to improve the quality of recognition. One of the most concerned problems in AOR is viewpoint planning (VP) which refers to developing a policy to determine the next viewpoints of the agent. A research trend is to solve the VP problem with reinforcement learning, namely to use the viewpoint transitions explored by the agent to train the VP policy. However, most research discards the trained transitions, which may lead to an inefficient use of the explored transitions. To solve this challenge, we present a novel VP method with transition management based on reinforcement learning, which can reuse the explored viewpoint transitions. To be specific, a learning framework of the VP policy is first established the deterministic policy gradient theory, which provides an opportunity to reuse the explored transitions. Then, we design a scheme of viewpoint transition management that can store the explored transitions and decide which transitions are used for the policy learning. Finally, within the framework, we develop an algorithm based on twin delayed deep deterministic policy gradient and the designed scheme to train the VP policy. Experiments on the public and challenging dataset GERMS show the effectiveness of our method in comparison with several competing approaches.
主动目标识别(AOR)提供了一种范例,即智能体可以通过有目的地改变其视角来获取额外证据,以提高识别质量。AOR中最受关注的问题之一是视角规划(VP),它指的是制定一种策略来确定智能体的下一个视角。一种研究趋势是用强化学习来解决VP问题,即利用智能体探索的视角转换来训练VP策略。然而,大多数研究丢弃了训练过的转换,这可能导致对探索到的转换利用效率低下。为了解决这一挑战,我们提出了一种基于强化学习的具有转换管理的新型VP方法,该方法可以重用探索到的视角转换。具体来说,首先基于确定性策略梯度理论建立VP策略的学习框架,这为重用探索到的转换提供了机会。然后,我们设计了一种视角转换管理方案,该方案可以存储探索到的转换,并决定哪些转换用于策略学习。最后,在该框架内,我们开发了一种基于双延迟深度确定性策略梯度和所设计方案的算法来训练VP策略。在公开且具有挑战性的GERMS数据集上进行的实验表明,与几种竞争方法相比,我们的方法是有效的。