Suppr超能文献

复杂主题和小语种中的自动立场检测:极化新闻媒体中移民问题的挑战案例。

Automated stance detection in complex topics and small languages: The challenging case of immigration in polarizing news media.

机构信息

School of Humanities, Tallinn University, Tallinn, Estonia.

ERA Chair for Cultural Data Analytics, Tallinn University, Tallinn, Estonia.

出版信息

PLoS One. 2024 Apr 26;19(4):e0302380. doi: 10.1371/journal.pone.0302380. eCollection 2024.

Abstract

Automated stance detection and related machine learning methods can provide useful insights for media monitoring and academic research. Many of these approaches require annotated training datasets, which limits their applicability for languages where these may not be readily available. This paper explores the applicability of large language models for automated stance detection in a challenging scenario, involving a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration. If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios. We annotate a large set of pro- and anti-immigration examples to train and compare the performance of multiple language models. We also probe the usability of GPT-3.5 (that powers ChatGPT) as an instructable zero-shot classifier for the same task. The supervised models achieve acceptable performance, but GPT-3.5 yields similar accuracy. As the latter does not require tuning with annotated data, it constitutes a potentially simpler and cheaper alternative for text classification tasks, including in lower-resource languages. We further use the best-performing supervised model to investigate diachronic trends over seven years in two corpora of Estonian mainstream and right-wing populist news sources, demonstrating the applicability of automated stance detection for news analytics and media monitoring settings even in lower-resource scenarios, and discuss correspondences between stance changes and real-world events.

摘要

自动立场检测和相关的机器学习方法可以为媒体监测和学术研究提供有用的见解。许多此类方法都需要标注的训练数据集,这限制了它们在可能无法轻易获得这些数据集的语言中的适用性。本文探讨了在涉及形态复杂、资源较少的语言和社会文化复杂的话题移民的具有挑战性的场景中,使用大型语言模型进行自动立场检测的适用性。如果该方法在这种情况下有效,则可以预期它在要求较低的场景中表现得同样或更好。我们标注了大量支持和反对移民的示例,以训练和比较多种语言模型的性能。我们还探讨了 GPT-3.5(为 ChatGPT 提供支持)作为同一任务的可指导零样本分类器的可用性。监督模型的性能可接受,但 GPT-3.5 的准确率相似。由于后者不需要使用标注数据进行调整,因此它构成了文本分类任务的一种潜在更简单、更经济的替代方法,包括在资源较少的语言中。我们进一步使用性能最佳的监督模型来研究两个爱沙尼亚主流和右翼民粹主义新闻来源的语料库中七年的历时趋势,证明即使在资源较少的情况下,自动立场检测也适用于新闻分析和媒体监测设置,并讨论立场变化与现实世界事件之间的对应关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e92/11051607/4d977db64155/pone.0302380.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验