College of Pharmacy, Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.
College of Pharmacy, Yonsei University, Incheon 21983, Korea.
Int J Environ Res Public Health. 2022 Apr 22;19(9):5126. doi: 10.3390/ijerph19095126.
Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as 'misinformation' and 'other.' We fine-tuned five BERT models (BERT, BERT, BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897-0.911, while BERT and BERT achieved accuracy of 0.887-0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter.
每当病毒爆发时,就会出现与大蒜相关的错误信息。随着 COVID-19 的爆发,有关大蒜的错误信息通过社交媒体(包括 Twitter)传播。双向编码器表示转换器(BERT)可用于对大量推文进行分类。本研究旨在应用 BERT 模型对 Twitter 上有关大蒜和 COVID-19 的错误信息进行分类,使用了 5929 条提及大蒜和 COVID-19 的原始推文(4151 条用于微调,1778 条用于测试)。推文被手动标记为“错误信息”和“其他”。我们使用一般的 COVID-19 谣言数据集或大蒜专用数据集对五个 BERT 模型(BERT、BERT、BERTweet-base、BERTweet-COVID-19 和 BERTweet-large)进行了微调。我们计算了准确性和 F1 分数来评估模型的性能。使用 COVID-19 谣言数据集进行微调的 BERT 模型表现不佳,最高准确性为 0.647。使用大蒜专用数据集进行微调的 BERT 模型表现更好。BERTweet 模型的准确性达到 0.897-0.911,而 BERT 和 BERT 的准确性达到 0.887-0.897。BERTweet-large 的表现最佳,准确性最高为 0.911,F1 分数为 0.894。因此,BERT 模型在分类错误信息方面表现良好。我们的研究结果将有助于检测 Twitter 上有关大蒜和 COVID-19 的错误信息。
Int J Environ Res Public Health. 2022-4-22
Public Health. 2022-2
Knowl Based Syst. 2023-8-15
Appl Soft Comput. 2021-8
J Med Internet Res. 2021-1-22
Complex Intell Systems. 2023
J Healthc Eng. 2022
Soc Media Soc. 2020-7-30
JMIR Public Health Surveill. 2021-6-28
Front Psychol. 2021-5-31
Inf Process Manag. 2021-7
JMIR Public Health Surveill. 2021-2-23
Ann Pharmacother. 2020-5-12