复杂主题和小语种中的自动立场检测：极化新闻媒体中移民问题的挑战案例。

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media.

机构信息

School of Humanities, Tallinn University, Tallinn, Estonia.

ERA Chair for Cultural Data Analytics, Tallinn University, Tallinn, Estonia.

出版信息

PLoS One. 2024 Apr 26;19(4):e0302380. doi: 10.1371/journal.pone.0302380. eCollection 2024.

DOI:10.1371/journal.pone.0302380

PMID:38669237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11051607/

Abstract

Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging scenario, involving a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration. If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios. We annotate a large set of pro- and anti-immigration examples to train and compare the performance of multiple language models. We also probe the usability of GPT-3.5 (that powers ChatGPT) as an instructable zero-shot classifier for the same task. The supervised models achieve acceptable performance, but GPT-3.5 yields similar accuracy. As the latter does not require tuning with annotated data, it constitutes a potentially simpler and cheaper alternative for text classification tasks, including in lower-resource languages. We further use the best-performing supervised model to investigate diachronic trends over seven years in two corpora of Estonian mainstream and right-wing populist news sources, demonstrating the applicability of automated stance detection for news analytics and media monitoring settings even in lower-resource scenarios, and discuss correspondences between stance changes and real-world events.

摘要

自动立场检测和相关的机器学习方法可以为媒体监测和学术研究提供有用的见解。许多此类方法都需要标注的训练数据集，这限制了它们在可能无法轻易获得这些数据集的语言中的适用性。本文探讨了在涉及形态复杂、资源较少的语言和社会文化复杂的话题移民的具有挑战性的场景中，使用大型语言模型进行自动立场检测的适用性。如果该方法在这种情况下有效，则可以预期它在要求较低的场景中表现得同样或更好。我们标注了大量支持和反对移民的示例，以训练和比较多种语言模型的性能。我们还探讨了 GPT-3.5（为 ChatGPT 提供支持）作为同一任务的可指导零样本分类器的可用性。监督模型的性能可接受，但 GPT-3.5 的准确率相似。由于后者不需要使用标注数据进行调整，因此它构成了文本分类任务的一种潜在更简单、更经济的替代方法，包括在资源较少的语言中。我们进一步使用性能最佳的监督模型来研究两个爱沙尼亚主流和右翼民粹主义新闻来源的语料库中七年的历时趋势，证明即使在资源较少的情况下，自动立场检测也适用于新闻分析和媒体监测设置，并讨论立场变化与现实世界事件之间的对应关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e92/11051607/4d977db64155/pone.0302380.g001.jpg

相似文献

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media.复杂主题和小语种中的自动立场检测：极化新闻媒体中移民问题的挑战案例。

PLoS One. 2024 Apr 26;19(4):e0302380. doi: 10.1371/journal.pone.0302380. eCollection 2024.

GPT is an effective tool for multilingual psychological text analysis.GPT 是一种用于多语言心理文本分析的有效工具。

Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2308950121. doi: 10.1073/pnas.2308950121. Epub 2024 Aug 12.

GPT-4 as an X data annotator: Unraveling its performance on a stance classification task.GPT-4 作为 X 数据标注员：在立场分类任务中表现如何。

PLoS One. 2024 Aug 15;19(8):e0307741. doi: 10.1371/journal.pone.0307741. eCollection 2024.

Zero-shot stance detection: Paradigms and challenges.零样本立场检测：范式与挑战。

Front Artif Intell. 2023 Jan 13;5:1070429. doi: 10.3389/frai.2022.1070429. eCollection 2022.

ChatGPT outperforms crowd workers for text-annotation tasks.在文本注释任务中，ChatGPT的表现优于众包工作者。

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2305016120. doi: 10.1073/pnas.2305016120. Epub 2023 Jul 18.

Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study.ChatGPT 在粤语情感分析中的有效性：对比研究。

J Med Internet Res. 2024 Jan 30;26:e51069. doi: 10.2196/51069.

Text classification models for the automatic detection of nonmedical prescription medication use from social media.社交媒体中非医疗处方药物使用的自动检测的文本分类模型。

BMC Med Inform Decis Mak. 2021 Jan 26;21(1):27. doi: 10.1186/s12911-021-01394-0.

A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification.大语言模型的零样本推理与监督建模在乳腺癌病理分类中的比较研究

Res Sq. 2024 Feb 6:rs.3.rs-3914899. doi: 10.21203/rs.3.rs-3914899/v1.

Leveraging Symbolic Knowledge Bases for Commonsense Natural Language Inference Using Pattern Theory.利用符号知识库和模式理论进行常识自然语言推理。

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13185-13202. doi: 10.1109/TPAMI.2023.3287837. Epub 2023 Oct 3.

A comparison of few-shot and traditional named entity recognition models for medical text.医学文本的少样本与传统命名实体识别模型比较

Proc (IEEE Int Conf Healthc Inform). 2022 Jun;2022:84-89. doi: 10.1109/ichi54592.2022.00024. Epub 2022 Sep 8.

引用本文的文献

A systematic review of automated hyperpartisan news detection.自动超党派新闻检测的系统综述。

PLoS One. 2025 Feb 21;20(2):e0316989. doi: 10.1371/journal.pone.0316989. eCollection 2025.

本文引用的文献

ChatGPT outperforms crowd workers for text-annotation tasks.在文本注释任务中，ChatGPT的表现优于众包工作者。

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2305016120. doi: 10.1073/pnas.2305016120. Epub 2023 Jul 18.

Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration.对 140 年来美国政治演讲的计算分析显示，移民问题的表述更加积极，但两极分化也日益严重。

Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2120510119. doi: 10.1073/pnas.2120510119. Epub 2022 Jul 29.

A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts.一种通用的资源受限的短文本情感表达、标注和分析框架。

PLoS One. 2020 Nov 12;15(11):e0242050. doi: 10.1371/journal.pone.0242050. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

复杂主题和小语种中的自动立场检测：极化新闻媒体中移民问题的挑战案例。

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献