Suppr超能文献

主题与情感的粗略对齐:一种跨语言情感分类的统一模型

Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment Classification.

作者信息

Wang Deqing, Jing Baoyu, Lu Chenwei, Wu Junjie, Liu Guannan, Du Chenguang, Zhuang Fuzhen

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):736-747. doi: 10.1109/TNNLS.2020.2979225. Epub 2021 Feb 4.

Abstract

Cross-lingual sentiment classification (CLSC) aims to leverage rich-labeled resources in the source language to improve prediction models of a resource-scarce domain in the target language. Existing feature representation learning-based approaches try to minimize the difference of latent features between different domains by exact alignment, which is achieved by either one-to-one topic alignment or matrix projection. Exact alignment, however, restricts the representation flexibility and further degrades the model performances on CLSC tasks if the distribution difference between two language domains is large. On the other hand, most previous studies proposed document-level models or ignored sentiment polarities of topics that might lead to insufficient learning of latent features. To solve the abovementioned problems, we propose a coarse alignment mechanism to enhance the model's representation by a group-to-group topic alignment into an aspect-level fine-grained model. First, we propose an unsupervised aspect, opinion, and sentiment unification model (AOS), which trimodels aspects, opinions, and sentiments of reviews from different domains and helps capture more accurate latent feature representation by a coarse alignment mechanism. To further boost AOS, we propose ps-AOS, a partial supervised AOS model, in which labeled source language data help minimize the difference of feature representations between two language domains with the help of logistics regression. Finally, an expectation-maximization framework with Gibbs sampling is then proposed to optimize our model. Extensive experiments on various multilingual product review data sets show that ps-AOS significantly outperforms various kinds of state-of-the-art baselines.

摘要

跨语言情感分类(CLSC)旨在利用源语言中丰富的标注资源来改进目标语言中资源稀缺领域的预测模型。现有的基于特征表示学习的方法试图通过精确对齐来最小化不同领域之间潜在特征的差异,这可以通过一对一的主题对齐或矩阵投影来实现。然而,如果两个语言领域之间的分布差异很大,精确对齐会限制表示的灵活性,并进一步降低模型在CLSC任务上的性能。另一方面,大多数先前的研究提出了文档级模型,或者忽略了主题的情感极性,这可能导致潜在特征学习不足。为了解决上述问题,我们提出了一种粗对齐机制,通过组到组的主题对齐将模型增强为方面级细粒度模型。首先,我们提出了一种无监督的方面、观点和情感统一模型(AOS),该模型对来自不同领域的评论的方面、观点和情感进行三建模,并通过粗对齐机制帮助捕获更准确的潜在特征表示。为了进一步提升AOS,我们提出了ps-AOS,一种部分监督的AOS模型,其中带标注的源语言数据借助逻辑回归帮助最小化两个语言领域之间特征表示的差异。最后,提出了一个带有吉布斯采样的期望最大化框架来优化我们的模型。在各种多语言产品评论数据集上进行的广泛实验表明,ps-AOS显著优于各种最先进的基线模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验