Oguike Osondu, Primus Mpho
Institute for Intelligent Systems, University of Johannesburg, JBS Park, 69 Kingsway Avenue, Auckland Park, Johannesburg, South Africa.
Data Brief. 2024 Jun 26;55:110672. doi: 10.1016/j.dib.2024.110672. eCollection 2024 Aug.
The existence of diverse traditional machine learning and deep learning models designed for various multimodal music information retrieval (MIR) applications, such as multimodal music sentiment analysis, genre classification, recommender systems, and emotion recognition, renders the machine learning and deep learning models indispensable for the MIR tasks. However, solving these tasks in a data-driven manner depends on the availability of high-quality benchmark datasets. Hence, the necessity for datasets tailored for multimodal music information retrieval applications is paramount. While a handful of multimodal datasets exist for distinct music information retrieval applications, they are not available in low-resourced languages, like Sotho-Tswana languages. In response to this gap, we introduce a novel multimodal music information retrieval dataset for various music information retrieval applications. This dataset centres on Sotho-Tswana musical videos, encompassing both textual, visual, and audio modalities specific to Sotho-Tswana musical content. The musical videos were downloaded from YouTube, but Python programs were written to process the musical videos and extract relevant spectral-based acoustic features, using different Python libraries. Annotation of the dataset was done manually by native speakers of Sotho-Tswana languages, who understand the culture and traditions of the Sotho-Tswana people. It is distinctive as, to our knowledge, no such dataset has been established until now.
为各种多模态音乐信息检索(MIR)应用设计的多种传统机器学习和深度学习模型的存在,如多模态音乐情感分析、流派分类、推荐系统和情感识别,使得机器学习和深度学习模型对于MIR任务不可或缺。然而,以数据驱动的方式解决这些任务取决于高质量基准数据集的可用性。因此,为多模态音乐信息检索应用量身定制数据集的必要性至关重要。虽然存在一些针对不同音乐信息检索应用的多模态数据集,但它们在低资源语言(如索托 - 茨瓦纳语)中不可用。为了弥补这一差距,我们为各种音乐信息检索应用引入了一个新颖的多模态音乐信息检索数据集。该数据集以索托 - 茨瓦纳音乐视频为中心,涵盖了索托 - 茨瓦纳音乐内容特有的文本、视觉和音频模态。音乐视频从YouTube下载,但编写了Python程序来处理音乐视频,并使用不同的Python库提取相关的基于频谱的声学特征。该数据集的注释由精通索托 - 茨瓦纳语文化和传统的索托 - 茨瓦纳语母语人士手动完成。据我们所知,它很独特,因为到目前为止还没有建立这样的数据集。