• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于细粒度情感和特征融合的视听相关匹配方法。

An Audiovisual Correlation Matching Method Based on Fine-Grained Emotion and Feature Fusion.

机构信息

State Key Laboratory of Media Convergence and Communication, Beijing 100024, China.

Key Laboratory of Acoustic Visual Technology and Intelligent Control System, Ministry of Culture and Tourism, Beijing 100024, China.

出版信息

Sensors (Basel). 2024 Aug 31;24(17):5681. doi: 10.3390/s24175681.

DOI:10.3390/s24175681
PMID:39275592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11397978/
Abstract

Most existing intelligent editing tools for music and video rely on the cross-modal matching technology of the affective consistency or the similarity of feature representations. However, these methods are not fully applicable to complex audiovisual matching scenarios, resulting in low matching accuracy and suboptimal audience perceptual effects due to ambiguous matching rules and associated factors. To address these limitations, this paper focuses on both the similarity and integration of affective distribution for the artistic audiovisual works of movie and television video and music. Based on the rich emotional perception elements, we propose a hybrid matching model based on feature canonical correlation analysis (CCA) and fine-grained affective similarity. The model refines KCCA fusion features by analyzing both matched and unmatched music-video pairs. Subsequently, the model employs XGBoost to predict relevance and to compute similarity by considering fine-grained affective semantic distance as well as affective factor distance. Ultimately, the matching prediction values are obtained through weight allocation. Experimental results on a self-built dataset demonstrate that the proposed affective matching model balances feature parameters and affective semantic cognitions, yielding relatively high prediction accuracy and better subjective experience of audiovisual association. This paper is crucial for exploring the affective association mechanisms of audiovisual objects from a sensory perspective and improving related intelligent tools, thereby offering a novel technical approach to retrieval and matching in music-video editing.

摘要

现有的音乐和视频智能编辑工具大多依赖情感一致性或特征表示相似性的跨模态匹配技术。然而,这些方法并不完全适用于复杂的视听匹配场景,由于匹配规则不明确和相关因素的影响,导致匹配精度低,观众感知效果不佳。为了解决这些限制,本文关注电影和电视视频以及音乐的艺术视听作品的情感分布的相似性和融合。基于丰富的情感感知元素,我们提出了一种基于特征典型相关分析(CCA)和细粒度情感相似性的混合匹配模型。该模型通过分析匹配和不匹配的音乐-视频对来细化 KCCA 融合特征。然后,该模型使用 XGBoost 通过考虑细粒度情感语义距离和情感因素距离来预测相关性并计算相似性。最终,通过权重分配获得匹配预测值。在自建数据集上的实验结果表明,所提出的情感匹配模型平衡了特征参数和情感语义认知,具有相对较高的预测准确性和更好的视听关联主观体验。本文从感官角度探索视听对象的情感关联机制,改进相关智能工具,为音乐视频编辑中的检索和匹配提供了一种新的技术方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/4dd959f25316/sensors-24-05681-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/f06a3fc6d1be/sensors-24-05681-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/0efb531d9540/sensors-24-05681-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/5214f015cae5/sensors-24-05681-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/6476f6dc1e02/sensors-24-05681-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/4dd959f25316/sensors-24-05681-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/f06a3fc6d1be/sensors-24-05681-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/0efb531d9540/sensors-24-05681-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/5214f015cae5/sensors-24-05681-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/6476f6dc1e02/sensors-24-05681-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/533f/11397978/4dd959f25316/sensors-24-05681-g005.jpg

相似文献

1
An Audiovisual Correlation Matching Method Based on Fine-Grained Emotion and Feature Fusion.基于细粒度情感和特征融合的视听相关匹配方法。
Sensors (Basel). 2024 Aug 31;24(17):5681. doi: 10.3390/s24175681.
2
Design of Semantic Matching Model of Folk Music in Occupational Therapy Based on Audio Emotion Analysis.基于音频情绪分析的职业治疗中民乐语义匹配模型的设计。
Occup Ther Int. 2022 Jun 18;2022:6841445. doi: 10.1155/2022/6841445. eCollection 2022.
3
Electroencephalography Amplitude Modulation Analysis for Automated Affective Tagging of Music Video Clips.用于音乐视频片段自动情感标注的脑电图幅度调制分析
Front Comput Neurosci. 2018 Jan 10;11:115. doi: 10.3389/fncom.2017.00115. eCollection 2017.
4
Query-Adaptive Late Fusion for Hierarchical Fine-Grained Video-Text Retrieval.用于分层细粒度视频-文本检索的查询自适应晚期融合
IEEE Trans Neural Netw Learn Syst. 2022 Oct 24;PP. doi: 10.1109/TNNLS.2022.3214208.
5
EEG Emotion Recognition Applied to the Effect Analysis of Music on Emotion Changes in Psychological Healthcare.脑电情绪识别在音乐对心理保健中情绪变化影响的分析中的应用。
Int J Environ Res Public Health. 2022 Dec 26;20(1):378. doi: 10.3390/ijerph20010378.
6
A Dual-Path Cross-Modal Network for Video-Music Retrieval.一种用于视频-音乐检索的双通道跨模态网络。
Sensors (Basel). 2023 Jan 10;23(2):805. doi: 10.3390/s23020805.
7
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
8
AttendAffectNet-Emotion Prediction of Movie Viewers Using Multimodal Fusion with Self-Attention.使用带有自注意力机制的多模态融合方法预测电影观众的 AttendAffectNet-Emotion。
Sensors (Basel). 2021 Dec 14;21(24):8356. doi: 10.3390/s21248356.
9
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.基于知识蒸馏的潜在空间语义监督用于跨模态检索
IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.
10
Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model.基于卷积神经网络模型的跨媒体音视频评分识别神经网络模型设计。
Comput Intell Neurosci. 2022 Jun 13;2022:4626867. doi: 10.1155/2022/4626867. eCollection 2022.

引用本文的文献

1
ARMNet: A Network for Image Dimensional Emotion Prediction Based on Affective Region Extraction and Multi-Channel Fusion.ARMNet:基于情感区域提取和多通道融合的图像维度情感预测网络。
Sensors (Basel). 2024 Nov 4;24(21):7099. doi: 10.3390/s24217099.

本文引用的文献

1
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
2
Prediction of Ground Wave Propagation Delay for MF R-Mode.中频R模式地波传播延迟的预测
Sensors (Basel). 2024 Jan 3;24(1):282. doi: 10.3390/s24010282.
3
Towards Personalised Mood Prediction and Explanation for Depression from Biophysical Data.从生物物理数据中预测和解释抑郁症的个性化情绪。
Sensors (Basel). 2023 Dec 27;24(1):164. doi: 10.3390/s24010164.
4
A Short Video Classification Framework Based on Cross-Modal Fusion.基于跨模态融合的短视频分类框架
Sensors (Basel). 2023 Oct 12;23(20):8425. doi: 10.3390/s23208425.
5
Using AI-ML to Augment the Capabilities of Social Media for Telehealth and Remote Patient Monitoring.利用人工智能-机器学习增强社交媒体在远程医疗和远程患者监测方面的能力。
Healthcare (Basel). 2023 Jun 10;11(12):1704. doi: 10.3390/healthcare11121704.
6
Dynamic Heterogeneous User Generated Contents-Driven Relation Assessment via Graph Representation Learning.基于图表示学习的动态异质用户生成内容驱动关系评估。
Sensors (Basel). 2022 Feb 11;22(4):1402. doi: 10.3390/s22041402.
7
An Empathy Evaluation System Using Spectrogram Image Features of Audio.基于音频声谱图特征的共情评估系统
Sensors (Basel). 2021 Oct 26;21(21):7111. doi: 10.3390/s21217111.
8
Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning.跨领域视觉匹配通过广义相似性度量和特征学习。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1089-1102. doi: 10.1109/TPAMI.2016.2567386. Epub 2016 May 12.
9
Multimodal Similarity-Preserving Hashing.多模态相似保持哈希。
IEEE Trans Pattern Anal Mach Intell. 2014 Apr;36(4):824-30. doi: 10.1109/TPAMI.2013.225.