Changchun Humanities and Sciences College, Changchun 130117, Jilin, China.
Comput Intell Neurosci. 2022 Aug 29;2022:5611456. doi: 10.1155/2022/5611456. eCollection 2022.
This paper designs a multimodal convolutional neural network model for the intelligent analysis of the influence of music genres on children's emotions by constructing a multimodal convolutional neural network model and profoundly analyzing the impact of music genres on children's feelings. Considering the diversity of music genre features in the audio power spectrogram, the Mel filtering method is used in the feature extraction stage to ensure the effective retention of the genre feature attributes of the audio signal by dimensional reduction of the Mel filtered signal, deepening the differences of the extracted features between different genres, and to reduce the input size and expand the model training scale in the model input stage, the audio power spectrogram obtained by feature extraction is cut the MSCN-LSTM consists of two modules: multiscale convolutional kernel convolutional neural network and long and short term memory network. The MSCNN network is used to extract the EEG signal features, the LSTM network is used to remove the temporal characteristics of the eye-movement signal, and the feature fusion is done by feature-level fusion. The multimodal signal has a higher emotion classification accuracy than the unimodal signal, and the average accuracy of emotion quadruple classification based on a 6-channel EEG signal, and children's multimodal signal reaches 97.94%. After pretraining with the MSD (Million Song Dataset) dataset in this paper, the model effect was further improved significantly. The accuracy of the Dense Inception network improved to 91.0% and 89.91% on the GTZAN dataset and ISMIR2004 dataset, respectively, proving that the Dense Inception network's effectiveness and advancedness of the Dense Inception network were demonstrated.
本文设计了一种多模态卷积神经网络模型,通过构建多模态卷积神经网络模型,深入分析音乐类型对儿童情感的影响,实现对音乐类型对儿童情感影响的智能分析。考虑到音频功率频谱图中音乐类型特征的多样性,在特征提取阶段使用梅尔滤波方法,通过对梅尔滤波信号的降维,保证音频信号的类型特征属性得到有效保留,深化不同类型提取特征之间的差异,降低模型输入阶段的输入大小,扩展模型训练规模。在模型输入阶段,对通过特征提取得到的音频功率频谱图进行裁剪。MSCN-LSTM 由两个模块组成:多尺度卷积核卷积神经网络和长短时记忆网络。MSCNN 网络用于提取 EEG 信号特征,LSTM 网络用于去除眼动信号的时间特征,通过特征级融合进行特征融合。多模态信号的情绪分类准确率高于单模态信号,基于 6 通道 EEG 信号的情绪四重分类平均准确率达到 97.94%。在本文使用 MSD(百万歌曲数据集)数据集进行预训练后,模型效果得到了进一步显著提升。在 GTZAN 数据集和 ISMIR2004 数据集上,Dense Inception 网络的准确率分别提高到了 91.0%和 89.91%,证明了 Dense Inception 网络的有效性和先进性。