• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于快慢音视频网络和无监督特征表示的音乐视频情感分类。

Music video emotion classification using slow-fast audio-video network and unsupervised feature representation.

机构信息

Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea.

出版信息

Sci Rep. 2021 Oct 6;11(1):19834. doi: 10.1038/s41598-021-98856-2.

DOI:10.1038/s41598-021-98856-2
PMID:34615904
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8494760/
Abstract

Affective computing has suffered by the precise annotation because the emotions are highly subjective and vague. The music video emotion is complex due to the diverse textual, acoustic, and visual information which can take the form of lyrics, singer voice, sounds from the different instruments, and visual representations. This can be one reason why there has been a limited study in this domain and no standard dataset has been produced before now. In this study, we proposed an unsupervised method for music video emotion analysis using music video contents on the Internet. We also produced a labelled dataset and compared the supervised and unsupervised methods for emotion classification. The music and video information are processed through a multimodal architecture with audio-video information exchange and boosting method. The general 2D and 3D convolution networks compared with the slow-fast network with filter and channel separable convolution in multimodal architecture. Several supervised and unsupervised networks were trained in an end-to-end manner and results were evaluated using various evaluation metrics. The proposed method used a large dataset for unsupervised emotion classification and interpreted the results quantitatively and qualitatively in the music video that had never been applied in the past. The result shows a large increment in classification score using unsupervised features and information sharing techniques on audio and video network. Our best classifier attained 77% accuracy, an f1-score of 0.77, and an area under the curve score of 0.94 with minimum computational cost.

摘要

情感计算受到了精确标注的影响,因为情感是高度主观和模糊的。音乐视频的情感是复杂的,因为它包含了多种文本、声学和视觉信息,可以采用歌词、歌手的声音、来自不同乐器的声音以及视觉表现形式。这可能是这个领域研究有限,以前没有产生标准数据集的一个原因。在这项研究中,我们提出了一种使用互联网上的音乐视频内容进行音乐视频情感分析的无监督方法。我们还制作了一个标记数据集,并比较了监督和无监督方法的情感分类。音乐和视频信息通过具有音频-视频信息交换和提升方法的多模态架构进行处理。与多模态架构中的滤波器和通道可分离卷积的慢-快网络相比,一般的 2D 和 3D 卷积网络。以端到端的方式训练了几个监督和无监督网络,并使用各种评估指标评估了结果。该方法使用了一个大型数据集进行无监督情感分类,并对过去从未应用过的音乐视频进行了定量和定性的结果解释。结果表明,在音频和视频网络上使用无监督特征和信息共享技术可以大大提高分类得分。我们的最佳分类器以最小的计算成本获得了 77%的准确率、0.77 的 F1 分数和 0.94 的曲线下面积分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/7c03042dba01/41598_2021_98856_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/609cf67e2814/41598_2021_98856_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/08736f488751/41598_2021_98856_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/1c52d9b1d667/41598_2021_98856_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/8fa9de1c2a18/41598_2021_98856_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/f4a7a13816e9/41598_2021_98856_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/3014c902e2d5/41598_2021_98856_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/226a969c11d4/41598_2021_98856_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/7dbc9564df52/41598_2021_98856_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/1b9dc1404eaf/41598_2021_98856_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/b8963702213b/41598_2021_98856_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/7c03042dba01/41598_2021_98856_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/609cf67e2814/41598_2021_98856_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/08736f488751/41598_2021_98856_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/1c52d9b1d667/41598_2021_98856_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/8fa9de1c2a18/41598_2021_98856_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/f4a7a13816e9/41598_2021_98856_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/3014c902e2d5/41598_2021_98856_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/226a969c11d4/41598_2021_98856_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/7dbc9564df52/41598_2021_98856_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/1b9dc1404eaf/41598_2021_98856_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/b8963702213b/41598_2021_98856_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e712/8494760/7c03042dba01/41598_2021_98856_Fig11_HTML.jpg

相似文献

1
Music video emotion classification using slow-fast audio-video network and unsupervised feature representation.基于快慢音视频网络和无监督特征表示的音乐视频情感分类。
Sci Rep. 2021 Oct 6;11(1):19834. doi: 10.1038/s41598-021-98856-2.
2
Deep-Learning-Based Multimodal Emotion Classification for Music Videos.基于深度学习的音乐视频多模态情感分类。
Sensors (Basel). 2021 Jul 20;21(14):4927. doi: 10.3390/s21144927.
3
A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition.深度学习方法在音乐情感识别中的比较研究。
Sensors (Basel). 2024 Mar 29;24(7):2201. doi: 10.3390/s24072201.
4
Intelligent Classification Model of Music Emotional Environment Using Convolutional Neural Networks.基于卷积神经网络的音乐情感环境智能分类模型
J Environ Public Health. 2022 Aug 30;2022:7221064. doi: 10.1155/2022/7221064. eCollection 2022.
5
Music Emotion Classification Method Based on Deep Learning and Improved Attention Mechanism.基于深度学习和改进注意力机制的音乐情感分类方法。
Comput Intell Neurosci. 2022 Jun 20;2022:5181899. doi: 10.1155/2022/5181899. eCollection 2022.
6
A Dual-Path Cross-Modal Network for Video-Music Retrieval.一种用于视频-音乐检索的双通道跨模态网络。
Sensors (Basel). 2023 Jan 10;23(2):805. doi: 10.3390/s23020805.
7
AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition.AVaTER:使用跨模态注意力融合音频、视觉和文本模态进行情感识别。
Sensors (Basel). 2024 Sep 10;24(18):5862. doi: 10.3390/s24185862.
8
A Music Emotion Classification Model Based on the Improved Convolutional Neural Network.基于改进卷积神经网络的音乐情绪分类模型。
Comput Intell Neurosci. 2022 Feb 14;2022:6749622. doi: 10.1155/2022/6749622. eCollection 2022.
9
Comparing supervised and unsupervised approaches to multimodal emotion recognition.比较有监督和无监督方法在多模态情感识别中的应用
PeerJ Comput Sci. 2021 Dec 24;7:e804. doi: 10.7717/peerj-cs.804. eCollection 2021.
10
A Multimodal Convolutional Neural Network Model for the Analysis of Music Genre on Children's Emotions Influence Intelligence.用于分析音乐类型对儿童情绪智力影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Aug 29;2022:5611456. doi: 10.1155/2022/5611456. eCollection 2022.

本文引用的文献

1
Deep-Learning-Based Multimodal Emotion Classification for Music Videos.基于深度学习的音乐视频多模态情感分类。
Sensors (Basel). 2021 Jul 20;21(14):4927. doi: 10.3390/s21144927.
2
Editorial: The Impact of Music on Human Development and Well-Being.社论:音乐对人类发展与幸福的影响
Front Psychol. 2020 Jun 17;11:1246. doi: 10.3389/fpsyg.2020.01246. eCollection 2020.
3
Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition.深度联合时空网络 (DJSTN) 用于高效的面部表情识别。
Sensors (Basel). 2020 Mar 30;20(7):1936. doi: 10.3390/s20071936.
4
Squeeze-and-Excitation Networks.挤压激励网络。
IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):2011-2023. doi: 10.1109/TPAMI.2019.2913372. Epub 2019 Apr 29.
5
The Effects of User Engagements for User and Company Generated Videos on Music Sales: Empirical Evidence From YouTube.用户参与度对用户和公司制作的视频在音乐销售方面的影响:来自YouTube的实证证据。
Front Psychol. 2018 Oct 5;9:1880. doi: 10.3389/fpsyg.2018.01880. eCollection 2018.
6
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.
7
Developing a benchmark for emotional analysis of music.建立音乐情感分析的基准。
PLoS One. 2017 Mar 10;12(3):e0173392. doi: 10.1371/journal.pone.0173392. eCollection 2017.
8
Individual differences in musical taste.音乐品味的个体差异。
Am J Psychol. 2010 Summer;123(2):199-208. doi: 10.5406/amerjpsyc.123.2.0199.