Weng Yu, Chen Lin, Wang Sen, Ye Xuming, Liu Xuan, Liu Zheng
Key Laboratory of Ethnic Language Intelligent Analysis and Security, Governance of MOE, Minzu University of China, Beijing, 100081, China.
School of Information Engineering, Minzu University of China, Beijing, 100081, China.
Heliyon. 2024 Jun 14;10(12):e32967. doi: 10.1016/j.heliyon.2024.e32967. eCollection 2024 Jun 30.
Aspect-level sentiment analysis within multimodal contexts, focusing on the precise identification and interpretation of sentiment attitudes linked to the target aspect across diverse data modalities, remains a focal research area that perpetuates the advancement of discourse and innovation in artificial intelligence. However, most existing methods tend to focus on extracting visual features from only one facet, such as face expression, which ignores the value of information from other key facets, such as the textual information presented by the image modality, resulting in information loss. To overcome the aforementioned constraint, we put forth a novel approach designated as Multi-faceted Information Extraction and Cross-mixture Fusion (MIECF) for Multimodal Aspect-based Sentiment Analysis. Our approach captures more comprehensive visual information in the image and integrates these local and global key features from multiple facets. Local features, such as facial expressions and textual features, provide direct and rich emotional cues. By contrast, the global feature often reflects the overall emotional atmosphere and context. To enhance the visual representation, we designed a Cross-mixture Fusion method to integrate this local and global multimodal information. In particular, the method establishes semantic relationships between local and global features to eliminate ambiguity brought by single-facet information and achieve more accurate contextual understanding, providing a richer and more precise manner for sentiment analysis. The experimental findings indicate that our proposed approach achieves a leading level of performance, resulting in an Accuracy of 79.65 % on the Twitter-2015 dataset, and Macro-F1 scores of 75.90 % and 73.11 % for the Twitter-2015 and Twitter-2017 datasets, respectively.
多模态语境下的方面级情感分析,专注于跨多种数据模态精确识别和解释与目标方面相关的情感态度,仍然是一个关键研究领域,推动着人工智能领域的话语发展和创新。然而,大多数现有方法往往只侧重于从一个方面提取视觉特征,如面部表情,而忽略了其他关键方面的信息价值,如图像模态中呈现的文本信息,导致信息丢失。为了克服上述限制,我们提出了一种新颖的方法,称为多方面信息提取与交叉混合融合(MIECF),用于多模态方面情感分析。我们的方法在图像中捕捉更全面的视觉信息,并整合来自多个方面的这些局部和全局关键特征。局部特征,如面部表情和文本特征,提供直接而丰富的情感线索。相比之下,全局特征通常反映整体情感氛围和语境。为了增强视觉表示,我们设计了一种交叉混合融合方法来整合这种局部和全局多模态信息。具体而言,该方法在局部和全局特征之间建立语义关系,以消除单一方面信息带来的模糊性,并实现更准确的语境理解,为情感分析提供更丰富、更精确的方式。实验结果表明,我们提出的方法达到了领先的性能水平,在Twitter-2015数据集上的准确率为79.65%,在Twitter-2015和Twitter-2017数据集上的宏F1分数分别为75.90%和73.11%。