• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多模态音乐流派识别的音频、文本和图像特征的统计与可视化分析

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

作者信息

Wilkes Ben, Vatolkin Igor, Müller Heinrich

机构信息

Department of Computer Science, Technische Universität Dortmund, 44227 Dortmund, Germany.

出版信息

Entropy (Basel). 2021 Nov 12;23(11):1502. doi: 10.3390/e23111502.

DOI:10.3390/e23111502
PMID:34828199
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8621318/
Abstract

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.

摘要

我们提出了一种多模态流派识别框架,该框架通过从音频信号、专辑封面图像和音乐曲目歌词中提取的特征来考虑音频、文本和图像模态。与相关工作中通过神经网络纯粹学习特征不同,我们还整合了为各个模态设计的手工特征,这使得创建的模型具有更高的可解释性,并能对单个特征对流派预测的影响进行进一步的理论分析。流派识别是基于基本特征的组合,对音乐曲目相对于每个流派进行二元分类来执行的。对于特征组合,使用了一种两级技术,该技术将聚合为固定长度的特征向量与基于置信度的分类结果融合相结合。针对三种分类器模型(朴素贝叶斯、支持向量机和随机森林)以及众多特征组合进行了广泛的实验。结果以可视化方式呈现,通过多目标分析和对非支配数据的限制实现了数据简化以提高可感知性。基于数据提出了与特征和分类器相关的假设,并对其统计显著性进行了形式化分析。统计分析表明,两种模态的组合几乎总是会导致性能显著提高,在某些情况下三种模态的组合也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/fe39b61e9040/entropy-23-01502-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/663e2d175ed1/entropy-23-01502-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/d194ca045ecf/entropy-23-01502-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/b2179309b097/entropy-23-01502-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/fe39b61e9040/entropy-23-01502-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/663e2d175ed1/entropy-23-01502-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/d194ca045ecf/entropy-23-01502-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/b2179309b097/entropy-23-01502-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8628/8621318/fe39b61e9040/entropy-23-01502-g004.jpg

相似文献

1
Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.用于多模态音乐流派识别的音频、文本和图像特征的统计与可视化分析
Entropy (Basel). 2021 Nov 12;23(11):1502. doi: 10.3390/e23111502.
2
A computational lens into how music characterizes genre in film.计算镜头下的电影音乐如何塑造流派特征。
PLoS One. 2021 Apr 8;16(4):e0249957. doi: 10.1371/journal.pone.0249957. eCollection 2021.
3
Multi-Modal Song Mood Detection with Deep Learning.基于深度学习的多模态歌曲情绪检测。
Sensors (Basel). 2022 Jan 29;22(3):1065. doi: 10.3390/s22031065.
4
Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model.基于卷积神经网络模型的跨媒体音视频评分识别神经网络模型设计。
Comput Intell Neurosci. 2022 Jun 13;2022:4626867. doi: 10.1155/2022/4626867. eCollection 2022.
5
Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification.用于人群分析与分类的视听多模态驱动混合特征学习模型
Math Biosci Eng. 2023 May 25;20(7):12529-12561. doi: 10.3934/mbe.2023558.
6
A Multimodal Convolutional Neural Network Model for the Analysis of Music Genre on Children's Emotions Influence Intelligence.用于分析音乐类型对儿童情绪智力影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Aug 29;2022:5611456. doi: 10.1155/2022/5611456. eCollection 2022.
7
Research on Classroom Online Teaching Model of "Learning" Wisdom Music on Wireless Network under the Background of Artificial Intelligence.人工智能背景下无线网络“学堂乐歌”智慧课堂在线教学模式研究。
Comput Math Methods Med. 2021 Nov 27;2021:3141661. doi: 10.1155/2021/3141661. eCollection 2021.
8
An Empathy Evaluation System Using Spectrogram Image Features of Audio.基于音频声谱图特征的共情评估系统
Sensors (Basel). 2021 Oct 26;21(21):7111. doi: 10.3390/s21217111.
9
A Multi-Modal Convolutional Neural Network Model for Intelligent Analysis of the Influence of Music Genres on Children's Emotions.一种用于智能分析音乐流派对儿童情绪影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Jul 19;2022:4957085. doi: 10.1155/2022/4957085. eCollection 2022.
10
Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition.多模态残差感知机网络的音视频情感识别。
Sensors (Basel). 2021 Aug 12;21(16):5452. doi: 10.3390/s21165452.