Luo Jia, El Baz Didier, Shi Lei
College of Economics and Management, Beijing University of Technology, Beijing, China.
Chongqing Research Institute, Beijing University of Technology, Chongqing, China.
Digit Health. 2024 Oct 7;10:20552076241284773. doi: 10.1177/20552076241284773. eCollection 2024 Jan-Dec.
To address the complexities of distinguishing truth from falsehood in the context of the COVID-19 infodemic, this paper focuses on utilizing deep learning models for infodemic ternary classification detection.
Eight commonly used deep learning models are employed to categorize collected records as true, false, or uncertain. These models include fastText, three models based on recurrent neural networks, two models based on convolutional neural networks, and two transformer-based models.
Precision, recall, and 1-score metrics for each category, along with overall accuracy, are presented to establish benchmark results. Additionally, a comprehensive analysis of the confusion matrix is conducted to provide insights into the models' performance.
Given the limited availability of infodemic records and the relatively modest size of the two tested data sets, models with pretrained embeddings or simpler architectures tend to outperform their more complex counterparts. This highlights the potential efficiency of pretrained or simpler models for ternary classification in COVID-19 infodemic detection and underscores the need for further research in this area.
为应对新冠疫情信息疫情背景下辨别真假的复杂性,本文着重利用深度学习模型进行信息疫情三元分类检测。
采用八个常用的深度学习模型将收集到的记录分类为真、假或不确定。这些模型包括fastText、三个基于循环神经网络的模型、两个基于卷积神经网络的模型以及两个基于Transformer的模型。
呈现了每个类别的精确率、召回率和F1分数指标以及总体准确率,以建立基准结果。此外,还对混淆矩阵进行了全面分析,以深入了解模型的性能。
鉴于信息疫情记录的可用性有限以及两个测试数据集规模相对较小,具有预训练嵌入或更简单架构的模型往往比更复杂的模型表现更好。这凸显了预训练或更简单模型在新冠疫情信息疫情检测三元分类中的潜在效率,并强调了该领域进一步研究的必要性。