Martin Charles Patrick, Glette Kyrre, Nygaard Tønnes Frostad, Torresen Jim
Research School of Computer Science, Australian National University, Canberra, ACT, Australia.
Department of Informatics, University of Oslo, Oslo, Norway.
Front Artif Intell. 2020 Mar 3;3:6. doi: 10.3389/frai.2020.00006. eCollection 2020.
Machine-learning models of music often exist outside the worlds of musical performance practice and abstracted from the physical gestures of musicians. In this work, we consider how a recurrent neural network (RNN) model of simple music gestures may be integrated into a physical instrument so that predictions are sonically and physically entwined with the performer's actions. We introduce EMPI, an embodied musical prediction interface that simplifies musical interaction and prediction to just one dimension of continuous input and output. The predictive model is a mixture density RNN trained to estimate the performer's next physical input action and the time at which this will occur. Predictions are represented sonically through synthesized audio, and physically with a motorized output indicator. We use EMPI to investigate how performers understand and exploit different predictive models to make music through a controlled study of performances with different models and levels of physical feedback. We show that while performers often favor a model trained on human-sourced data, they find different musical affordances in models trained on synthetic, and even random, data. Physical representation of predictions seemed to affect the length of performances. This work contributes new understandings of how musicians use generative ML models in real-time performance backed up by experimental evidence. We argue that a constrained musical interface can expose the affordances of embodied predictive interactions.
音乐的机器学习模型通常存在于音乐表演实践之外,并且与音乐家的身体动作相脱节。在这项研究中,我们探讨了如何将简单音乐手势的循环神经网络(RNN)模型集成到实体乐器中,以使预测在声音和物理层面上与演奏者的动作紧密相连。我们引入了EMPI,一种具身音乐预测接口,它将音乐交互和预测简化为连续输入和输出的单一维度。预测模型是一个混合密度RNN,经过训练以估计演奏者的下一个身体输入动作以及该动作发生的时间。预测通过合成音频在声音上进行呈现,并通过电动输出指示器在物理层面上进行展示。我们使用EMPI通过对不同模型和物理反馈水平的表演进行对照研究,来探究演奏者如何理解和利用不同的预测模型来创作音乐。我们发现,虽然演奏者通常青睐基于人类源数据训练的模型,但他们在基于合成甚至随机数据训练的模型中发现了不同的音乐特性。预测的物理呈现似乎影响了表演的时长。这项研究为音乐家如何在实时表演中使用生成式机器学习模型提供了新的理解,并得到了实验证据的支持。我们认为,一个受限的音乐接口可以揭示具身预测交互的特性。