Multimedia Systems Department, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland.
Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland.
Sensors (Basel). 2022 Apr 15;22(8):3033. doi: 10.3390/s22083033.
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows.
本工作旨在提出一种新方法,使用针对每个测试乐器的一组单个卷积神经网络 (CNN) 自动识别音频摘录中出现的所有乐器。本文首先回顾了与乐器识别相关的任务。它重点介绍了执行的任务、输入类型、使用的算法和使用的指标。本文首先介绍背景,即元数据描述和相关工作回顾。然后展示为实验准备的数据集及其划分为训练集、验证集和评估集。然后,介绍所分析的神经网络模型架构。基于描述的模型进行训练,并确定训练集和验证集的几个质量指标。展示在单独的数据集上评估训练后的网络的结果。呈现了精度、召回率以及真实和假阳性和阴性检测的数量的详细值。该模型的效率很高,吉他的指标值范围从 0.86 到鼓的 0.99。最后,进行讨论并总结获得的结果。