Xinyang Normal University, Xinyang, Henan 464000, China.
Comput Intell Neurosci. 2022 Jun 13;2022:4626867. doi: 10.1155/2022/4626867. eCollection 2022.
In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature representation ability of the model. A network composed of a bidirectional simple loop unit and a chained time series classification function is used to identify notes, parallelizing a large number of calculations, thereby speeding up the convergence speed of training, which also makes the data in the dataset no longer need to be strict with labels. Alignment also reduces the requirements on the dataset. Aiming at the problem that the existing cross-modal retrieval methods based on common subspace are insufficient for mining local consistency within modalities, a cross-modal retrieval method fused with graph convolution is proposed. The K-nearest neighbor algorithm is used to construct modal graphs for samples of different modalities, and the original features of samples from different modalities are encoded through a symmetric graph convolutional coding network and a symmetric multilayer fully connected coding network, and the encoded features are fused and input. We jointly optimize the intramodal semantic constraints and intermodal modality-invariant constraints in the common subspace to learn highly locally consistent and semantically consistent common representations for samples from different modalities. The error value of the experimental results is used to illustrate the effect of parameters such as the number of iterations and the number of neurons on the network. In order to more accurately illustrate that the generated music sequence is very similar to the original music sequence, the generated music sequence is also framed, and finally the music sequence spectrogram and spectrogram are generated. The accuracy of the experiment is illustrated by comparing the spectrogram and the spectrogram, and genre classification predictions are also performed on the generated music to show that the network can generate music of different genres.
本文使用残差卷积神经网络提取乐谱图像中的音符特征,解决模型退化问题;然后使用多尺度特征融合,融合同一特征图中不同层次的特征信息,增强模型的特征表示能力。使用由双向简单循环单元和链式时间序列分类函数组成的网络来识别音符,并行化大量计算,从而加快训练的收敛速度,这也使得数据集中的数据不再需要严格的标签对齐,从而降低了对数据集的要求。针对基于公共子空间的现有跨模态检索方法对模态内局部一致性挖掘不足的问题,提出了一种融合图卷积的跨模态检索方法。使用 K-最近邻算法为不同模态的样本构建模态图,通过对称图卷积编码网络和对称多层全连接编码网络对来自不同模态的样本的原始特征进行编码,并融合编码特征进行输入。我们联合优化公共子空间中的模态内语义约束和模态不变约束,学习来自不同模态的样本的高度局部一致和语义一致的公共表示。实验结果的误差值用于说明迭代次数和神经元数量等参数对网络的影响。为了更准确地说明生成的音乐序列与原始音乐序列非常相似,还对生成的音乐序列进行了加框处理,最后生成音乐序列的频谱图和声谱图。通过比较频谱图和声谱图来说明实验的准确性,并对生成的音乐进行流派分类预测,以表明网络可以生成不同流派的音乐。