Karbasi Seyed Mojtaba, Jensenius Alexander Refsum, Godøy Rolf Inge, Torresen Jim
RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway.
Department of Informatics, University of Oslo, Oslo, Norway.
Front Robot AI. 2024 Nov 18;11:1450097. doi: 10.3389/frobt.2024.1450097. eCollection 2024.
This paper investigates the potential of the intrinsically motivated reinforcement learning (IMRL) approach for robotic drumming. For this purpose, we implemented an IMRL-based algorithm for a drumming robot called , an underactuated two-DoF robotic arm with flexible grippers. Two ZRob robots were instructed to play rhythmic patterns derived from MIDI files. The RL algorithm is based on the deep deterministic policy gradient (DDPG) method, but instead of relying solely on extrinsic rewards, the robots are trained using a combination of both extrinsic and intrinsic reward signals. The results of the training experiments show that the utilization of intrinsic reward can lead to meaningful novel rhythmic patterns, while using only extrinsic reward would lead to predictable patterns identical to the MIDI inputs. Additionally, the observed drumming patterns are influenced not only by the learning algorithm but also by the robots' physical dynamics and the drum's constraints. This work suggests new insights into the potential of embodied intelligence for musical performance.
本文研究了基于内在动机的强化学习(IMRL)方法在机器人击鼓方面的潜力。为此,我们为一个名为ZRob的击鼓机器人实现了一种基于IMRL的算法,它是一个带有柔性夹爪的欠驱动双自由度机器人手臂。两台ZRob机器人被指令演奏从MIDI文件中提取的节奏模式。该强化学习算法基于深度确定性策略梯度(DDPG)方法,但机器人不是仅依赖外在奖励进行训练,而是使用外在和内在奖励信号的组合进行训练。训练实验结果表明,使用内在奖励能够产生有意义的新颖节奏模式,而仅使用外在奖励则会导致与MIDI输入相同的可预测模式。此外,观察到的击鼓模式不仅受学习算法的影响,还受机器人的物理动力学和鼓的约束的影响。这项工作为具身智能在音乐表演中的潜力提供了新的见解。