Suppr超能文献

基于方面的多模态情感分析的文本-图像语义相关性识别

Text-image semantic relevance identification for aspect-based multimodal sentiment analysis.

作者信息

Zhang Tianzhi, Zhou Gang, Lu Jicang, Li Zhibo, Wu Hao, Liu Shuo

机构信息

Information Engineering University, Zhengzhou, Henan, China.

出版信息

PeerJ Comput Sci. 2024 Apr 12;10:e1904. doi: 10.7717/peerj-cs.1904. eCollection 2024.

Abstract

Aspect-based multimodal sentiment analysis (ABMSA) is an emerging task in the research of multimodal sentiment analysis, which aims to identify the sentiment of each aspect mentioned in multimodal sample. Although recent research on ABMSA has achieved some success, most existing models only adopt attention mechanism to interact aspect with text and image respectively and obtain sentiment output through multimodal concatenation, they often neglect to consider that some samples may not have semantic relevance between text and image. In this article, we propose a Text-Image Semantic Relevance Identification (TISRI) model for ABMSA to address the problem. Specifically, we introduce a multimodal feature relevance identification module to calculate the semantic similarity between text and image, and then construct an image gate to dynamically control the input image information. On this basis, an image auxiliary information is provided to enhance the semantic expression ability of visual feature representation to generate more intuitive image representation. Furthermore, we employ attention mechanism during multimodal feature fusion to obtain the text-aware image representation through text-image interaction to prevent irrelevant image information interfering our model. Experiments demonstrate that TISRI achieves competitive results on two ABMSA Twitter datasets, and then validate the effectiveness of our methods.

摘要

基于方面的多模态情感分析(ABMSA)是多模态情感分析研究中的一个新兴任务,旨在识别多模态样本中提及的各个方面的情感。尽管最近关于ABMSA的研究取得了一些成功,但大多数现有模型仅采用注意力机制分别将方面与文本和图像进行交互,并通过多模态拼接获得情感输出,它们常常忽略考虑一些样本的文本和图像之间可能不存在语义相关性。在本文中,我们提出了一种用于ABMSA的文本-图像语义相关性识别(TISRI)模型来解决该问题。具体而言,我们引入了一个多模态特征相关性识别模块来计算文本和图像之间的语义相似度,然后构建一个图像门来动态控制输入的图像信息。在此基础上,提供图像辅助信息以增强视觉特征表示的语义表达能力,从而生成更直观的图像表示。此外,我们在多模态特征融合过程中采用注意力机制,通过文本-图像交互获得文本感知的图像表示,以防止不相关的图像信息干扰我们的模型。实验表明,TISRI在两个ABMSA Twitter数据集上取得了有竞争力的结果,进而验证了我们方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40d8/11636758/cfd3ff5dd2fc/peerj-cs-10-1904-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验