Zhang Ning
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):1754-1763. doi: 10.1109/TNNLS.2020.2990746. Epub 2023 Apr 4.
Symbolic music generation is still an unsettled problem facing several challenges. The complete music score is a quite long note sequence, which consists of multiple tracks with recurring elements and their variants at various levels. The transformer model, benefiting from its self-attention has shown advantages in modeling long sequences. There have been some attempts at applying the transformer-based model to music generation. However, previous works train the model using the same strategy as the text generation task, despite the obvious differences between the pattern of texts and musics. These models cannot consistently produce music samples of high quality. In this article, we propose a novel adversarial transformer to generate transformer to generate music pieces with high musicality. The generative adversarial learning and the self-attention networks are combined creatively. The generation of long sequence is guided by the adversarial objectives, which provides a strong regularization to enforce the transformer to focus on learning of the global and local structures. Instead of adopting the time-consuming Monte Carlo (MC) search method that is commonly used in the existing sequence generative models, we propose an effective and convenient method to compute the reward for each generated step (REGS) for the long sequence. The discriminator is trained to optimize the elaborately designed global and local loss objective functions simultaneously, which enables the discriminator to give reliable REGS for the generator. The adversarial objective combined with the teacher forcing objective is used to guide the training of the generator. The proposed model can be used to generate single-track or multitrack music pieces. Experiments show that our model can generate long music pieces with the improved quality compared with the original music transformers.
符号音乐生成仍然是一个尚未解决的问题,面临着诸多挑战。完整的音乐乐谱是一个相当长的音符序列,它由多个音轨组成,这些音轨在各个层面都包含重复的元素及其变体。变压器模型由于其自注意力机制,在对长序列进行建模时展现出了优势。已经有一些将基于变压器的模型应用于音乐生成的尝试。然而,尽管文本和音乐的模式存在明显差异,但先前的工作使用与文本生成任务相同的策略来训练模型。这些模型无法持续生成高质量的音乐样本。在本文中,我们提出了一种新颖的对抗性变压器来生成具有高音乐性的音乐作品。我们创造性地将生成对抗学习和自注意力网络结合起来。长序列的生成由对抗目标引导,这提供了强大的正则化,促使变压器专注于学习全局和局部结构。我们没有采用现有序列生成模型中常用的耗时的蒙特卡罗(MC)搜索方法,而是提出了一种有效且便捷的方法来计算长序列中每个生成步骤的奖励(REGS)。鉴别器经过训练,可同时优化精心设计的全局和局部损失目标函数,这使得鉴别器能够为生成器给出可靠的REGS。对抗目标与教师强制目标相结合,用于指导生成器的训练。所提出的模型可用于生成单轨或多轨音乐作品。实验表明,与原始的音乐变压器相比,我们的模型能够生成质量更高的长音乐作品。