School of Information Science and Technology, Northwest University, Xi'an, China.
School of Computer Science and Electronic Engineering, University of Essex, Colchester, United Kingdom.
PLoS One. 2023 Apr 20;18(4):e0284560. doi: 10.1371/journal.pone.0284560. eCollection 2023.
In this paper, we create EMIR, the first-ever Music Information Retrieval dataset for Ethiopian music. EMIR is freely available for research purposes and contains 600 sample recordings of Orthodox Tewahedo chants, traditional Azmari songs and contemporary Ethiopian secular music. Each sample is classified by five expert judges into one of four well-known Ethiopian Kiñits, Tizita, Bati, Ambassel and Anchihoye. Each Kiñit uses its own pentatonic scale and also has its own stylistic characteristics. Thus, Kiñit classification needs to combine scale identification with genre recognition. After describing the dataset, we present the Ethio Kiñits Model (EKM), based on VGG, for classifying the EMIR clips. In Experiment 1, we investigated whether Filterbank, Mel-spectrogram, Chroma, or Mel-frequency Cepstral coefficient (MFCC) features work best for Kiñit classification using EKM. MFCC was found to be superior and was therefore adopted for Experiment 2, where the performance of EKM models using MFCC was compared using three different audio sample lengths. 3s length gave the best results. In Experiment 3, EKM and four existing models were compared on the EMIR dataset: AlexNet, ResNet50, VGG16 and LSTM. EKM was found to have the best accuracy (95.00%) as well as the fastest training time. However, the performance of VGG16 (93.00%) was found not to be significantly worse (P < 0.01). We hope this work will encourage others to explore Ethiopian music and to experiment with other models for Kiñit classification.
在本文中,我们创建了 EMIR,这是第一个针对埃塞俄比亚音乐的音乐信息检索数据集。EMIR 可供研究使用,其中包含 600 个样本录音,包括东正教特瓦赫多颂歌、传统的阿兹马里歌曲和当代埃塞俄比亚世俗音乐。每个样本由五名专家评委分为四个著名的埃塞俄比亚 Kiñit 之一,即 Tizita、Bati、Ambassel 和 Anchihoye。每个 Kiñit 使用自己的五声音阶,也有自己的风格特点。因此,Kini 分类需要将音阶识别与流派识别结合起来。在描述数据集之后,我们提出了基于 VGG 的 Ethio Kiñits Model (EKM),用于对 EMIR 剪辑进行分类。在实验 1 中,我们研究了使用 EKM 时,Filterbank、Mel-spectrogram、Chroma 或 Mel-frequency Cepstral coefficient (MFCC) 特征是否最适合 Kiñit 分类。结果发现 MFCC 效果最佳,因此在实验 2 中采用了 MFCC,比较了使用三种不同音频样本长度的 EKM 模型的性能。3s 长度的结果最佳。在实验 3 中,我们在 EMIR 数据集上比较了 EKM 和四个现有模型:AlexNet、ResNet50、VGG16 和 LSTM。结果发现 EKM 的准确率最高(95.00%),训练时间最快。然而,VGG16(93.00%)的性能并不明显更差(P < 0.01)。我们希望这项工作将鼓励其他人探索埃塞俄比亚音乐,并尝试使用其他模型进行 Kiñit 分类。