• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于卷积神经网络模型的跨媒体音视频评分识别神经网络模型设计。

Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model.

机构信息

Xinyang Normal University, Xinyang, Henan 464000, China.

出版信息

Comput Intell Neurosci. 2022 Jun 13;2022:4626867. doi: 10.1155/2022/4626867. eCollection 2022.

DOI:10.1155/2022/4626867
PMID:35733575
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9208963/
Abstract

In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature representation ability of the model. A network composed of a bidirectional simple loop unit and a chained time series classification function is used to identify notes, parallelizing a large number of calculations, thereby speeding up the convergence speed of training, which also makes the data in the dataset no longer need to be strict with labels. Alignment also reduces the requirements on the dataset. Aiming at the problem that the existing cross-modal retrieval methods based on common subspace are insufficient for mining local consistency within modalities, a cross-modal retrieval method fused with graph convolution is proposed. The K-nearest neighbor algorithm is used to construct modal graphs for samples of different modalities, and the original features of samples from different modalities are encoded through a symmetric graph convolutional coding network and a symmetric multilayer fully connected coding network, and the encoded features are fused and input. We jointly optimize the intramodal semantic constraints and intermodal modality-invariant constraints in the common subspace to learn highly locally consistent and semantically consistent common representations for samples from different modalities. The error value of the experimental results is used to illustrate the effect of parameters such as the number of iterations and the number of neurons on the network. In order to more accurately illustrate that the generated music sequence is very similar to the original music sequence, the generated music sequence is also framed, and finally the music sequence spectrogram and spectrogram are generated. The accuracy of the experiment is illustrated by comparing the spectrogram and the spectrogram, and genre classification predictions are also performed on the generated music to show that the network can generate music of different genres.

摘要

本文使用残差卷积神经网络提取乐谱图像中的音符特征,解决模型退化问题;然后使用多尺度特征融合,融合同一特征图中不同层次的特征信息,增强模型的特征表示能力。使用由双向简单循环单元和链式时间序列分类函数组成的网络来识别音符,并行化大量计算,从而加快训练的收敛速度,这也使得数据集中的数据不再需要严格的标签对齐,从而降低了对数据集的要求。针对基于公共子空间的现有跨模态检索方法对模态内局部一致性挖掘不足的问题,提出了一种融合图卷积的跨模态检索方法。使用 K-最近邻算法为不同模态的样本构建模态图,通过对称图卷积编码网络和对称多层全连接编码网络对来自不同模态的样本的原始特征进行编码,并融合编码特征进行输入。我们联合优化公共子空间中的模态内语义约束和模态不变约束,学习来自不同模态的样本的高度局部一致和语义一致的公共表示。实验结果的误差值用于说明迭代次数和神经元数量等参数对网络的影响。为了更准确地说明生成的音乐序列与原始音乐序列非常相似,还对生成的音乐序列进行了加框处理,最后生成音乐序列的频谱图和声谱图。通过比较频谱图和声谱图来说明实验的准确性,并对生成的音乐进行流派分类预测,以表明网络可以生成不同流派的音乐。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/81b84e37ad59/CIN2022-4626867.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3a765cc04f3d/CIN2022-4626867.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3723961d08b4/CIN2022-4626867.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3b0ff37b99b1/CIN2022-4626867.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/211e9c8ce35e/CIN2022-4626867.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/1af1d391d5c5/CIN2022-4626867.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/76965f468f5c/CIN2022-4626867.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/d7ad055b1c91/CIN2022-4626867.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/d03bffeeaf4d/CIN2022-4626867.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/81b84e37ad59/CIN2022-4626867.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3a765cc04f3d/CIN2022-4626867.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3723961d08b4/CIN2022-4626867.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/3b0ff37b99b1/CIN2022-4626867.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/211e9c8ce35e/CIN2022-4626867.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/1af1d391d5c5/CIN2022-4626867.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/76965f468f5c/CIN2022-4626867.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/d7ad055b1c91/CIN2022-4626867.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/d03bffeeaf4d/CIN2022-4626867.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4216/9208963/81b84e37ad59/CIN2022-4626867.009.jpg

相似文献

1
Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model.基于卷积神经网络模型的跨媒体音视频评分识别神经网络模型设计。
Comput Intell Neurosci. 2022 Jun 13;2022:4626867. doi: 10.1155/2022/4626867. eCollection 2022.
2
A Multimodal Convolutional Neural Network Model for the Analysis of Music Genre on Children's Emotions Influence Intelligence.用于分析音乐类型对儿童情绪智力影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Aug 29;2022:5611456. doi: 10.1155/2022/5611456. eCollection 2022.
3
A Music Emotion Classification Model Based on the Improved Convolutional Neural Network.基于改进卷积神经网络的音乐情绪分类模型。
Comput Intell Neurosci. 2022 Feb 14;2022:6749622. doi: 10.1155/2022/6749622. eCollection 2022.
4
Construction of Music Intelligent Creation Model Based on Convolutional Neural Network.基于卷积神经网络的音乐智能创作模型构建。
Comput Intell Neurosci. 2022 Jul 5;2022:2854066. doi: 10.1155/2022/2854066. eCollection 2022.
5
A Multi-Modal Convolutional Neural Network Model for Intelligent Analysis of the Influence of Music Genres on Children's Emotions.一种用于智能分析音乐流派对儿童情绪影响的多模态卷积神经网络模型。
Comput Intell Neurosci. 2022 Jul 19;2022:4957085. doi: 10.1155/2022/4957085. eCollection 2022.
6
Music Similarity Detection Guided by Deep Learning Model.深度学习模型指导下的音乐相似度检测
Comput Intell Neurosci. 2023 Feb 20;2023:1263620. doi: 10.1155/2023/1263620. eCollection 2023.
7
A Cross-Media Advertising Design and Communication Model Based on Feature Subspace Learning.基于特征子空间学习的跨媒体广告设计与传播模型。
Comput Intell Neurosci. 2022 May 17;2022:5874722. doi: 10.1155/2022/5874722. eCollection 2022.
8
Variational Fuzzy Neural Network Algorithm for Music Intelligence Marketing Strategy Optimization.变分模糊神经网络算法在音乐智能营销策略优化中的应用。
Comput Intell Neurosci. 2022 Jan 6;2022:9051058. doi: 10.1155/2022/9051058. eCollection 2022.
9
Neural Network-Based Dynamic Segmentation and Weighted Integrated Matching of Cross-Media Piano Performance Audio Recognition and Retrieval Algorithm.基于神经网络的跨媒体钢琴演奏音频识别与检索算法的动态分割与加权集成匹配。
Comput Intell Neurosci. 2022 May 13;2022:9323646. doi: 10.1155/2022/9323646. eCollection 2022.
10
Design of Semiautomatic Digital Creation System for Electronic Music Based on Recurrent Neural Network.基于循环神经网络的电子音乐半自动数字创作系统设计。
Comput Intell Neurosci. 2022 Jun 27;2022:5457376. doi: 10.1155/2022/5457376. eCollection 2022.

引用本文的文献

1
The Generation of Piano Music Using Deep Learning Aided by Robotic Technology.利用机器人技术辅助深度学习生成钢琴音乐。
Comput Intell Neurosci. 2022 Oct 10;2022:8336616. doi: 10.1155/2022/8336616. eCollection 2022.

本文引用的文献

1
Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕
IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.
2
MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.MHTN:用于跨模态检索的模态对抗混合转移网络。
IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.
3
Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning.在线数据整理器:基于结构引导的多模态字典学习的微视频分类。
IEEE Trans Image Process. 2019 Mar;28(3):1235-1247. doi: 10.1109/TIP.2018.2875363. Epub 2018 Oct 10.