• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多语言文本分类与情感分析:对用于推特数据分类的多语言方法利用情况的比较分析。

Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data.

作者信息

Manias George, Mavrogiorgou Argyro, Kiourtis Athanasios, Symvoulidis Chrysostomos, Kyriazis Dimosthenis

机构信息

University of Piraeus, Piraeus, Greece.

出版信息

Neural Comput Appl. 2023 May 8:1-17. doi: 10.1007/s00521-023-08629-3.

DOI:10.1007/s00521-023-08629-3
PMID:37362579
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10165589/
Abstract

Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users' posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers.

摘要

文本分类和情感分析是两个最典型的自然语言处理任务,在医疗保健和政策制定等不同领域有各种新兴应用得以实施和利用。与此同时,社交媒体(如推特)的普及和使用量的巨大增长,导致用户生成数据大幅增加,主要表现为用户帖子中的相应文本。然而,由于这些数据具有领域多样性和高度多语言性,对这些特定数据进行分析并从中提取可操作的知识和附加值是一项具有挑战性的任务。后者凸显了对实施和利用领域无关和多语言解决方案的新需求。为了研究其中一部分挑战,本研究工作对用于对一个多语言语料库的情感和文本进行分类的多语言方法进行了比较分析。在这种情况下,使用了四个基于多语言BERT的分类器和一种零样本分类方法,并在多语言数据分类的准确性和适用性方面进行了比较。它们的比较揭示了有见地的结果,并具有双重解读。基于多语言BERT的分类器在对多语言数据进行训练和微调时能实现高性能和迁移推理。同时,零样本方法也提出了一种以更快、更高效和可扩展的方式创建多语言解决方案的新技术。它可以很容易地适用于新语言和新任务,同时在多种语言中取得相对较好的结果。然而,当效率和可扩展性不如准确性重要时,似乎这个模型以及一般的零样本模型无法与经过微调并训练的基于多语言BERT的分类器相媲美。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/43e8185e4c8e/521_2023_8629_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/6b842ce3e99b/521_2023_8629_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/a3a03e1a6274/521_2023_8629_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/757762c0babb/521_2023_8629_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/93cdd27fc332/521_2023_8629_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/85dd9e343638/521_2023_8629_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/4381197bfba5/521_2023_8629_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/43e8185e4c8e/521_2023_8629_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/6b842ce3e99b/521_2023_8629_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/a3a03e1a6274/521_2023_8629_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/757762c0babb/521_2023_8629_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/93cdd27fc332/521_2023_8629_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/85dd9e343638/521_2023_8629_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/4381197bfba5/521_2023_8629_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/10165589/43e8185e4c8e/521_2023_8629_Fig7_HTML.jpg

相似文献

1
Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data.多语言文本分类与情感分析:对用于推特数据分类的多语言方法利用情况的比较分析。
Neural Comput Appl. 2023 May 8:1-17. doi: 10.1007/s00521-023-08629-3.
2
Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.使用BERT + NBSVM和地理空间方法的疫苗情绪分析。
J Supercomput. 2023 May 7:1-31. doi: 10.1007/s11227-023-05319-8.
3
Multi-class sentiment analysis of urdu text using multilingual BERT.使用多语言 BERT 进行乌尔都语文本的多类情感分析。
Sci Rep. 2022 Mar 31;12(1):5436. doi: 10.1038/s41598-022-09381-9.
4
Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data.基于深度学习的多语言混合数据情感分析和攻击性语言识别。
Sci Rep. 2022 Dec 13;12(1):21557. doi: 10.1038/s41598-022-26092-3.
5
On cross-lingual retrieval with multilingual text encoders.关于使用多语言文本编码器进行跨语言检索。
Inf Retr Boston. 2022;25(2):149-183. doi: 10.1007/s10791-022-09406-x. Epub 2022 Mar 7.
6
A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data.ChatGPT与微调后的开放预训练变换器(OPT)与广泛使用的情感分析工具的比较:COVID-19调查数据的情感分析
JMIR Ment Health. 2024 Jan 25;11:e50150. doi: 10.2196/50150.
7
Sequence-to-sequence pretraining for a less-resourced Slovenian language.针对资源较少的斯洛文尼亚语的序列到序列预训练。
Front Artif Intell. 2023 Mar 28;6:932519. doi: 10.3389/frai.2023.932519. eCollection 2023.
8
Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model.使用来自Transformer的双向编码器表征(BERT)模型进行情感分类的迁移学习
Sensors (Basel). 2023 May 31;23(11):5232. doi: 10.3390/s23115232.
9
Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short- and long-distance semantics.用于综合多语言情感分析的异构文本图:捕捉短距离和长距离语义
PeerJ Comput Sci. 2024 Feb 23;10:e1876. doi: 10.7717/peerj-cs.1876. eCollection 2024.
10
Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT.普什图语冒犯性语言检测:一个基准数据集和单语普什图语BERT
PeerJ Comput Sci. 2023 Oct 18;9:e1617. doi: 10.7717/peerj-cs.1617. eCollection 2023.

引用本文的文献

1
Multilingual sentiment analysis in restaurant reviews using aspect focused learning.使用基于方面的学习方法对餐厅评论进行多语言情感分析。
Sci Rep. 2025 Aug 4;15(1):28371. doi: 10.1038/s41598-025-12464-y.
2
One size fits all: Enhanced zero-shot text classification for patient listening on social media.一劳永逸:社交媒体上用于患者倾听的增强型零样本文本分类
Front Artif Intell. 2025 Feb 11;7:1397470. doi: 10.3389/frai.2024.1397470. eCollection 2024.
3
Evolving techniques in sentiment analysis: a comprehensive review.情感分析中的技术演进:全面综述
PeerJ Comput Sci. 2025 Jan 28;11:e2592. doi: 10.7717/peerj-cs.2592. eCollection 2025.
4
MuTCELM: An optimal multi-TextCNN-based ensemble learning for text classification.MuTCELM:一种基于最优多文本卷积神经网络的文本分类集成学习方法
Heliyon. 2024 Sep 30;10(19):e38515. doi: 10.1016/j.heliyon.2024.e38515. eCollection 2024 Oct 15.
5
Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short- and long-distance semantics.用于综合多语言情感分析的异构文本图:捕捉短距离和长距离语义
PeerJ Comput Sci. 2024 Feb 23;10:e1876. doi: 10.7717/peerj-cs.1876. eCollection 2024.