用于社交媒体上基于方面的多模态情感分析的文本图像增强自监督对齐模型

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

作者信息

Zhao Xuefeng, Wang Yuxiang, Zhong Zhaoman

机构信息

School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China.

出版信息

Sensors (Basel). 2025 Apr 17;25(8):2553. doi: 10.3390/s25082553.

DOI:10.3390/s25082553

PMID:40285241

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12031619/

Abstract

The rapid development of social media has driven the need for opinion mining and sentiment analysis based on multimodal samples. As a fine-grained task within multimodal sentiment analysis, aspect-based multimodal sentiment analysis (ABMSA) enables the accurate and efficient determination of sentiment polarity for aspect-level targets. However, traditional ABMSA methods often perform suboptimally on social media samples, as the images in these samples typically contain embedded text that conventional models overlook. Such text influences sentiment judgment. To address this issue, we propose a text-in-image enhanced self-supervised alignment model (TESAM) that accounts for multimodal information more comprehensively. Specifically, we employed Optical Character Recognition technology to extract embedded text from images and, based on the principle that text-in-image is an integral part of the visual modality, fused it with visual features to obtain more comprehensive image representations. Additionally, we incorporate aspect words to guide the model in disregarding irrelevant semantic features, thereby reducing noise interference. Furthermore, to mitigate the semantic gap between modalities, we propose pre-training the feature extraction module with self-supervised alignment. During this pre-training stage, unimodal semantic embeddings from both modalities are aligned by calculating errors using Euclidean distance and cosine similarity. Experimental results demonstrate that TESAM achieved remarkable performances on three ABMSA benchmarks. These results validate the rationale and effectiveness of our proposed improvements.

摘要

社交媒体的快速发展推动了基于多模态样本的观点挖掘和情感分析的需求。作为多模态情感分析中的一项细粒度任务，基于方面的多模态情感分析（ABMSA）能够准确、高效地确定方面级目标的情感极性。然而，传统的ABMSA方法在社交媒体样本上的表现往往不尽人意，因为这些样本中的图像通常包含传统模型忽略的嵌入式文本。此类文本会影响情感判断。为了解决这个问题，我们提出了一种文本嵌入图像增强的自监督对齐模型（TESAM），该模型能更全面地考虑多模态信息。具体而言，我们采用光学字符识别技术从图像中提取嵌入式文本，并基于图像中的文本是视觉模态不可或缺的一部分这一原则，将其与视觉特征融合，以获得更全面的图像表示。此外，我们纳入方面词来引导模型忽略不相关的语义特征，从而减少噪声干扰。此外，为了缩小模态之间的语义差距，我们提出使用自监督对齐对特征提取模块进行预训练。在这个预训练阶段，通过使用欧几里得距离和余弦相似度计算误差，对来自两种模态的单模态语义嵌入进行对齐。实验结果表明，TESAM在三个ABMSA基准测试中取得了显著的性能。这些结果验证了我们提出的改进的合理性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f22/12031619/f9e2a5e9e076/sensors-25-02553-g001.jpg

相似文献

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

Sensors (Basel). 2025 Apr 17;25(8):2553. doi: 10.3390/s25082553.

Text-image semantic relevance identification for aspect-based multimodal sentiment analysis.

PeerJ Comput Sci. 2024 Apr 12;10:e1904. doi: 10.7717/peerj-cs.1904. eCollection 2024.

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media.

Sensors (Basel). 2023 Aug 22;23(17):7330. doi: 10.3390/s23177330.

Self-adaptive attention fusion for multimodal aspect-based sentiment analysis.

Math Biosci Eng. 2024 Jan;21(1):1305-1320. doi: 10.3934/mbe.2024056. Epub 2022 Dec 27.

Semantic enhancement and cross-modal interaction fusion for sentiment analysis in social media.

PLoS One. 2025 Apr 28;20(4):e0321011. doi: 10.1371/journal.pone.0321011. eCollection 2025.

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification.

Sensors (Basel). 2023 Jan 6;23(2):661. doi: 10.3390/s23020661.

Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples.

Sensors (Basel). 2024 Dec 3;24(23):7719. doi: 10.3390/s24237719.

Social media network public opinion emotion classification method based on multi-feature fusion and multi-scale hybrid neural network.

PeerJ Comput Sci. 2025 Jan 28;11:e2643. doi: 10.7717/peerj-cs.2643. eCollection 2025.

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion.

Sensors (Basel). 2023 Mar 1;23(5):2679. doi: 10.3390/s23052679.

Chinese text dual attention network for aspect-level sentiment classification.

PLoS One. 2024 Mar 7;19(3):e0295331. doi: 10.1371/journal.pone.0295331. eCollection 2024.

本文引用的文献

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media.

Sensors (Basel). 2023 Aug 22;23(17):7330. doi: 10.3390/s23177330.

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.

Deep Visual-Semantic Alignments for Generating Image Descriptions.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):664-676. doi: 10.1109/TPAMI.2016.2598339. Epub 2016 Aug 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于社交媒体上基于方面的多模态情感分析的文本图像增强自监督对齐模型

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

作者信息

Zhao Xuefeng, Wang Yuxiang, Zhong Zhaoman

机构信息

School of Computer Engineering, Jiangsu Ocean University, Lianyungang 222005, China.

出版信息

Sensors (Basel). 2025 Apr 17;25(8):2553. doi: 10.3390/s25082553.

DOI:10.3390/s25082553

PMID:40285241

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12031619/

Abstract

摘要

用于社交媒体上基于方面的多模态情感分析的文本图像增强自监督对齐模型

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于社交媒体上基于方面的多模态情感分析的文本图像增强自监督对齐模型

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

作者信息

机构信息

出版信息