Mu Guangyu, Chen Ying, Li Xiurong, Dai Li, Dai Jiaxiu
School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, China.
Key Laboratory of Financial Technology of Jilin Province, Changchun, China.
PLoS One. 2025 Apr 28;20(4):e0321011. doi: 10.1371/journal.pone.0321011. eCollection 2025.
The rapid development of social media has significantly impacted sentiment analysis, essential for understanding public opinion and predicting social trends. However, modality fusion in sentiment analysis can introduce a lot of noise because of the differences in semantic representations among various modalities, ultimately impacting the accuracy of classification results. Thus, this paper presents a Semantic Enhancement and Cross-Modal Interaction Fusion (SECIF) model for sentiment analysis to address these issues. Firstly, BERT and ResNet extract feature representations from text and images. Secondly, the GMHA mechanism is proposed to aggregate important semantic information and mitigate the influence of noise. Then, an ICN module is created to capture complex contextual dependencies and enhance the capability of text feature representations. Finally, a cross-modal interaction fusion module is implemented. Text features are considered primary, and image features are auxiliary, enabling the profound integration of textual and visual features. The model's performance is optimized by combining cross-entropy and KL divergence losses. The experiments are conducted using a dataset collected from public opinion events on Sina Weibo. The results demonstrate that the proposed model outperforms comparison models. The SECIF model improves by 11.19%, 82.27%, and 4.83% over the average accuracy of the text-only, image-only, and multimodal models, respectively. The proposed SECIF model is compared with ten baseline models on the publicly available datasets. The experimental results show that the SECIF model improves accuracy by 4.70% and F1 score by 6.56%. Through multimodal sentiment analysis, governments can better understand public emotions and opinion trends, facilitating more targeted and effective management strategies.
社交媒体的快速发展对情感分析产生了重大影响,情感分析对于理解公众舆论和预测社会趋势至关重要。然而,由于各种模态之间语义表示的差异,情感分析中的模态融合会引入大量噪声,最终影响分类结果的准确性。因此,本文提出了一种用于情感分析的语义增强和跨模态交互融合(SECIF)模型来解决这些问题。首先,BERT和ResNet从文本和图像中提取特征表示。其次,提出了GMHA机制来聚合重要的语义信息并减轻噪声的影响。然后,创建了一个ICN模块来捕获复杂的上下文依赖关系并增强文本特征表示的能力。最后,实现了一个跨模态交互融合模块。将文本特征视为主要特征,图像特征视为辅助特征,实现文本和视觉特征的深度融合。通过结合交叉熵和KL散度损失来优化模型的性能。使用从新浪微博上的舆论事件收集的数据集进行实验。结果表明,所提出的模型优于比较模型。SECIF模型分别比仅文本模型、仅图像模型和多模态模型的平均准确率提高了11.19%、82.27%和4.83%。在所公开的数据集上,将所提出的SECIF模型与十个基线模型进行比较。实验结果表明,SECIF模型的准确率提高了4.70%,F1分数提高了6.56%。通过多模态情感分析,政府可以更好地了解公众情绪和舆论趋势,促进制定更具针对性和有效性的管理策略。