Department of Computer Science and Electrical Engineering, Kumamoto University, Kumamoto 860-8555, Japan.
Machine Intelligence Laboratory, National University of Mongolia, Ulaanbaatar 14201, Mongolia.
Sensors (Basel). 2020 Oct 26;20(21):6077. doi: 10.3390/s20216077.
In recent years, many researchers have shown increasing interest in music information retrieval (MIR) applications, with automatic chord recognition being one of the popular tasks. Many studies have achieved/demonstrated considerable improvement using deep learning based models in automatic chord recognition problems. However, most of the existing models have focused on simple chord recognition, which classifies the root note with the major, minor, and seventh chords. Furthermore, in learning-based recognition, it is critical to collect high-quality and large amounts of training data to achieve the desired performance. In this paper, we present a multi-task learning (MTL) model for a guitar chord recognition task, where the model is trained using a relatively large-vocabulary guitar chord dataset. To solve data scarcity issues, a physical data augmentation method that directly records the chord dataset from a robotic performer is employed. Deep learning based MTL is proposed to improve the performance of automatic chord recognition with the proposed physical data augmentation dataset. The proposed MTL model is compared with four baseline models and its corresponding single-task learning model using two types of datasets, including a human dataset and a human combined with the augmented dataset. The proposed methods outperform the baseline models, and the results show that most scores of the proposed multi-task learning model are better than those of the corresponding single-task learning model. The experimental results demonstrate that physical data augmentation is an effective method for increasing the dataset size for guitar chord recognition tasks.
近年来,许多研究人员对音乐信息检索(MIR)应用越来越感兴趣,自动和弦识别是其中一个热门任务。许多研究使用基于深度学习的模型在自动和弦识别问题上取得了相当大的进展。然而,大多数现有的模型都集中在简单的和弦识别上,即对根音进行分类,包括大三和弦、小三和弦和七和弦。此外,在基于学习的识别中,收集高质量和大量的训练数据对于达到预期的性能至关重要。在本文中,我们提出了一种用于吉他和弦识别任务的多任务学习(MTL)模型,该模型使用相对较大词汇量的吉他和弦数据集进行训练。为了解决数据稀缺的问题,我们采用了一种物理数据增强方法,该方法直接从机器人演奏者那里记录和弦数据集。提出了基于深度学习的 MTL,以利用所提出的物理数据增强数据集来提高自动和弦识别的性能。将所提出的 MTL 模型与四个基线模型及其相应的单任务学习模型进行了比较,使用了两种数据集,包括人类数据集和人类与增强数据集的组合。所提出的方法优于基线模型,实验结果表明,大多数多任务学习模型的分数都优于相应的单任务学习模型的分数。实验结果表明,物理数据增强是一种有效的增加吉他和弦识别任务数据集大小的方法。