Institute of Communication and Media Studies, University of Bern, Bern, Switzerland.
Social Computing Group, University of Zurich, Zurich, Switzerland.
PLoS One. 2024 Nov 18;19(11):e0312865. doi: 10.1371/journal.pone.0312865. eCollection 2024.
To understand and measure political information consumption in the high-choice media environment, we need new methods to trace individual interactions with online content and novel techniques to analyse and detect politics-related information. In this paper, we report the results of a comparative analysis of the performance of automated content analysis techniques for detecting political content in the German language across different platforms. Using three validation datasets, we compare the performance of three groups of detection techniques relying on dictionaries, classic supervised machine learning, and deep learning. We also examine the impact of different modes of data preprocessing on the low-cost implementations of these techniques using a large set (n = 66) of models. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by deep learning- and classic machine learning-based models, in contrast to the more robust performance of dictionary-based models on noisy data.
为了理解和衡量高选择媒体环境中的政治信息消费,我们需要新的方法来追踪个体与在线内容的交互,并采用新的技术来分析和检测与政治相关的信息。在本文中,我们报告了对不同平台上德语中检测政治内容的自动化内容分析技术性能进行比较分析的结果。我们使用三个验证数据集,比较了基于词典、经典监督机器学习和深度学习的三组检测技术的性能。我们还研究了不同数据预处理模式对使用大型数据集(n=66)的模型的这些技术的低成本实现的影响。我们的结果表明,预处理对模型性能的影响有限,基于深度学习和经典机器学习的模型在低噪声数据上取得了最佳效果,而基于词典的模型在噪声数据上的表现则更加稳健。