Suppr超能文献

情感编码:多语言情感分析中一次性训练和全局预测的新范式。

SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis.

作者信息

Kanfoud Mohamed Raouf, Bouramoul Abdelkrim

机构信息

MISC Laboratory, Constantine 2 University Abdelhamid Mehri, Constantine, 25000 Algeria.

出版信息

J Intell Inf Syst. 2022;59(2):501-522. doi: 10.1007/s10844-022-00714-8. Epub 2022 May 25.

Abstract

The main objective of multilingual sentiment analysis is to analyze reviews regardless of the original language in which they are written. Switching from one language to another is very common on social media platforms. Analyzing these multilingual reviews is a challenge since each language is different in terms of syntax, grammar, etc. This paper presents a new language-independent representation approach for sentiment analysis, SentiCode. Unlike previous work in multilingual sentiment analysis, the proposed approach does not rely on machine translation to bridge the gap between different languages. Instead, it exploits common features of languages, such as part-of-speech tags used in Universal Dependencies. Equally important, SentiCode enables sentiment analysis in multi-language and multi-domain environments simultaneously. Several experiments were conducted using machine/deep learning techniques to evaluate the performance of SentiCode in multilingual (English, French, German, Arabic, and Russian) and multi-domain environments. In addition, the vocabulary proposed by SentiCode and the effect of each token were evaluated by the ablation method. The results highlight the 70% accuracy of SentiCode, with the best trade-off between efficiency and computing time (training and testing) in a total of about 0.67 seconds, which is very convenient for real-time applications.

摘要

多语言情感分析的主要目标是分析评论,而不考虑其原始语言。在社交媒体平台上,从一种语言切换到另一种语言是很常见的。分析这些多语言评论是一项挑战,因为每种语言在句法、语法等方面都有所不同。本文提出了一种用于情感分析的新的独立于语言的表示方法——SentiCode。与之前在多语言情感分析方面的工作不同,该方法不依赖机器翻译来弥合不同语言之间的差距。相反,它利用语言的共同特征,如通用依存关系中使用的词性标注。同样重要的是,SentiCode能够同时在多语言和多领域环境中进行情感分析。使用机器学习/深度学习技术进行了多项实验,以评估SentiCode在多语言(英语、法语、德语、阿拉伯语和俄语)和多领域环境中的性能。此外,还通过消融方法评估了SentiCode提出的词汇表以及每个词元的效果。结果显示SentiCode的准确率为70%,在效率和计算时间(训练和测试)之间达到了最佳平衡,总共约0.67秒,这对于实时应用非常方便。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e64d/9130974/ea0af461d0b5/10844_2022_714_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验