• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个用于索托-茨瓦纳音乐视频多模态音乐信息检索的数据集。

A dataset for multimodal music information retrieval of Sotho-Tswana musical videos.

作者信息

Oguike Osondu, Primus Mpho

机构信息

Institute for Intelligent Systems, University of Johannesburg, JBS Park, 69 Kingsway Avenue, Auckland Park, Johannesburg, South Africa.

出版信息

Data Brief. 2024 Jun 26;55:110672. doi: 10.1016/j.dib.2024.110672. eCollection 2024 Aug.

DOI:10.1016/j.dib.2024.110672
PMID:39071970
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11282976/
Abstract

The existence of diverse traditional machine learning and deep learning models designed for various multimodal music information retrieval (MIR) applications, such as multimodal music sentiment analysis, genre classification, recommender systems, and emotion recognition, renders the machine learning and deep learning models indispensable for the MIR tasks. However, solving these tasks in a data-driven manner depends on the availability of high-quality benchmark datasets. Hence, the necessity for datasets tailored for multimodal music information retrieval applications is paramount. While a handful of multimodal datasets exist for distinct music information retrieval applications, they are not available in low-resourced languages, like Sotho-Tswana languages. In response to this gap, we introduce a novel multimodal music information retrieval dataset for various music information retrieval applications. This dataset centres on Sotho-Tswana musical videos, encompassing both textual, visual, and audio modalities specific to Sotho-Tswana musical content. The musical videos were downloaded from YouTube, but Python programs were written to process the musical videos and extract relevant spectral-based acoustic features, using different Python libraries. Annotation of the dataset was done manually by native speakers of Sotho-Tswana languages, who understand the culture and traditions of the Sotho-Tswana people. It is distinctive as, to our knowledge, no such dataset has been established until now.

摘要

为各种多模态音乐信息检索(MIR)应用设计的多种传统机器学习和深度学习模型的存在,如多模态音乐情感分析、流派分类、推荐系统和情感识别,使得机器学习和深度学习模型对于MIR任务不可或缺。然而,以数据驱动的方式解决这些任务取决于高质量基准数据集的可用性。因此,为多模态音乐信息检索应用量身定制数据集的必要性至关重要。虽然存在一些针对不同音乐信息检索应用的多模态数据集,但它们在低资源语言(如索托 - 茨瓦纳语)中不可用。为了弥补这一差距,我们为各种音乐信息检索应用引入了一个新颖的多模态音乐信息检索数据集。该数据集以索托 - 茨瓦纳音乐视频为中心,涵盖了索托 - 茨瓦纳音乐内容特有的文本、视觉和音频模态。音乐视频从YouTube下载,但编写了Python程序来处理音乐视频,并使用不同的Python库提取相关的基于频谱的声学特征。该数据集的注释由精通索托 - 茨瓦纳语文化和传统的索托 - 茨瓦纳语母语人士手动完成。据我们所知,它很独特,因为到目前为止还没有建立这样的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/2b8584e4d2c2/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/4f0e0b190367/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/c903200d3f50/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/4654ffd4b4c8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/9278975eb0ed/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/2b8584e4d2c2/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/4f0e0b190367/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/c903200d3f50/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/4654ffd4b4c8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/9278975eb0ed/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eed/11282976/2b8584e4d2c2/gr5.jpg

相似文献

1
A dataset for multimodal music information retrieval of Sotho-Tswana musical videos.一个用于索托-茨瓦纳音乐视频多模态音乐信息检索的数据集。
Data Brief. 2024 Jun 26;55:110672. doi: 10.1016/j.dib.2024.110672. eCollection 2024 Aug.
2
Creating musical features using multi-faceted, multi-task encoders based on transformers.基于转换器的多方面、多任务编码器创建音乐特征。
Sci Rep. 2023 Jul 3;13(1):10713. doi: 10.1038/s41598-023-36714-z.
3
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.pyAudioAnalysis:一个用于音频信号分析的开源Python库。
PLoS One. 2015 Dec 11;10(12):e0144610. doi: 10.1371/journal.pone.0144610. eCollection 2015.
4
Intelligent Classification Model of Music Emotional Environment Using Convolutional Neural Networks.基于卷积神经网络的音乐情感环境智能分类模型
J Environ Public Health. 2022 Aug 30;2022:7221064. doi: 10.1155/2022/7221064. eCollection 2022.
5
Multi-Modal Song Mood Detection with Deep Learning.基于深度学习的多模态歌曲情绪检测。
Sensors (Basel). 2022 Jan 29;22(3):1065. doi: 10.3390/s22031065.
6
Deep-Learning-Based Multimodal Emotion Classification for Music Videos.基于深度学习的音乐视频多模态情感分类。
Sensors (Basel). 2021 Jul 20;21(14):4927. doi: 10.3390/s21144927.
7
Fusion of electroencephalographic dynamics and musical contents for estimating emotional responses in music listening.脑电动力学与音乐内容的融合用于估计音乐聆听中的情感反应。
Front Neurosci. 2014 May 1;8:94. doi: 10.3389/fnins.2014.00094. eCollection 2014.
8
Music video emotion classification using slow-fast audio-video network and unsupervised feature representation.基于快慢音视频网络和无监督特征表示的音乐视频情感分类。
Sci Rep. 2021 Oct 6;11(1):19834. doi: 10.1038/s41598-021-98856-2.
9
Musical emotions in the absence of music: A cross-cultural investigation of emotion communication in music by extra-musical cues.音乐之外的情感:跨文化研究音乐中非音乐线索的情感交流。
PLoS One. 2020 Nov 18;15(11):e0241196. doi: 10.1371/journal.pone.0241196. eCollection 2020.
10
Multimodal robotic music performance art based on GRU-GoogLeNet model fusing audiovisual perception.基于融合视听感知的GRU-谷歌网络模型的多模态机器人音乐表演艺术
Front Neurorobot. 2024 Jan 30;17:1324831. doi: 10.3389/fnbot.2023.1324831. eCollection 2023.

本文引用的文献

1
Multi-Modal Song Mood Detection with Deep Learning.基于深度学习的多模态歌曲情绪检测。
Sensors (Basel). 2022 Jan 29;22(3):1065. doi: 10.3390/s22031065.
2
Deep-Learning-Based Multimodal Emotion Classification for Music Videos.基于深度学习的音乐视频多模态情感分类。
Sensors (Basel). 2021 Jul 20;21(14):4927. doi: 10.3390/s21144927.
3
CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.CMU-MOSEAS:一个用于西班牙语、葡萄牙语、德语和法语的多模态语言数据集。
Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:1801-1812. doi: 10.18653/v1/2020.emnlp-main.141.