Department of Cybernetics and Artificial Intelligence, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 04200 Košice, Slovakia.
Sensors (Basel). 2022 Nov 30;22(23):9319. doi: 10.3390/s22239319.
This article focuses on the problem of detecting disinformation about COVID-19 in online discussions. As the Internet expands, so does the amount of content on it. In addition to content based on facts, a large amount of content is being manipulated, which negatively affects the whole society. This effect is currently compounded by the ongoing COVID-19 pandemic, which caused people to spend even more time online and to get more invested in this fake content. This work brings a brief overview of how toxic information looks like, how it is spread, and how to potentially prevent its dissemination by early recognition of disinformation using deep learning. We investigated the overall suitability of deep learning in solving problem of detection of disinformation in conversational content. We also provided a comparison of architecture based on convolutional and recurrent principles. We have trained three detection models based on three architectures using CNN (convolutional neural networks), LSTM (long short-term memory), and their combination. We have achieved the best results using LSTM (F1 = 0.8741, Accuracy = 0.8628). But the results of all three architectures were comparable, for example the CNN+LSTM architecture achieved F1 = 0.8672 and Accuracy = 0.852. The paper offers finding that introducing a convolutional component does not bring significant improvement. In comparison with our previous works, we noted that from all forms of antisocial posts, disinformation is the most difficult to recognize, since disinformation has no unique language, such as hate speech, toxic posts etc.
本文专注于在线讨论中检测 COVID-19 虚假信息的问题。随着互联网的扩展,其内容也在不断增加。除了基于事实的内容外,大量内容正在被操纵,这对整个社会产生了负面影响。这种影响目前因正在进行的 COVID-19 大流行而加剧,人们花在网上的时间更多,对这些虚假内容的投入也更多。这项工作简要概述了有毒信息的外观、传播方式,以及如何通过深度学习早期识别虚假信息来潜在地防止其传播。我们研究了深度学习在解决对话内容中虚假信息检测问题中的整体适用性。我们还比较了基于卷积和递归原理的架构。我们使用 CNN(卷积神经网络)、LSTM(长短期记忆)及其组合在三种架构上训练了三个检测模型。我们使用 LSTM 取得了最佳效果(F1 = 0.8741,准确率 = 0.8628)。但所有三种架构的结果都相当,例如 CNN+LSTM 架构的 F1 = 0.8672,准确率 = 0.852。本文提出的发现是引入卷积组件并不会带来显著的改进。与我们之前的工作相比,我们注意到在所有形式的反社会帖子中,虚假信息是最难识别的,因为虚假信息没有独特的语言,例如仇恨言论、有毒帖子等。