微调 BERT 模型以在 Twitter 上对大蒜和 COVID-19 相关的错误信息进行分类。

Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter.

机构信息

College of Pharmacy, Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul 03760, Korea.

College of Pharmacy, Yonsei University, Incheon 21983, Korea.

出版信息

Int J Environ Res Public Health. 2022 Apr 22;19(9):5126. doi: 10.3390/ijerph19095126.

DOI:10.3390/ijerph19095126

PMID:35564518

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9103576/

Abstract

Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as 'misinformation' and 'other.' We fine-tuned five BERT models (BERT, BERT, BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897-0.911, while BERT and BERT achieved accuracy of 0.887-0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter.

摘要

每当病毒爆发时，就会出现与大蒜相关的错误信息。随着 COVID-19 的爆发，有关大蒜的错误信息通过社交媒体（包括 Twitter）传播。双向编码器表示转换器（BERT）可用于对大量推文进行分类。本研究旨在应用 BERT 模型对 Twitter 上有关大蒜和 COVID-19 的错误信息进行分类，使用了 5929 条提及大蒜和 COVID-19 的原始推文（4151 条用于微调，1778 条用于测试）。推文被手动标记为“错误信息”和“其他”。我们使用一般的 COVID-19 谣言数据集或大蒜专用数据集对五个 BERT 模型（BERT、BERT、BERTweet-base、BERTweet-COVID-19 和 BERTweet-large）进行了微调。我们计算了准确性和 F1 分数来评估模型的性能。使用 COVID-19 谣言数据集进行微调的 BERT 模型表现不佳，最高准确性为 0.647。使用大蒜专用数据集进行微调的 BERT 模型表现更好。BERTweet 模型的准确性达到 0.897-0.911，而 BERT 和 BERT 的准确性达到 0.887-0.897。BERTweet-large 的表现最佳，准确性最高为 0.911，F1 分数为 0.894。因此，BERT 模型在分类错误信息方面表现良好。我们的研究结果将有助于检测 Twitter 上有关大蒜和 COVID-19 的错误信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ded0/9103576/c078e73ac7fa/ijerph-19-05126-g001.jpg

相似文献

Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter.微调 BERT 模型以在 Twitter 上对大蒜和 COVID-19 相关的错误信息进行分类。

Int J Environ Res Public Health. 2022 Apr 22;19(9):5126. doi: 10.3390/ijerph19095126.

Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较

Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis.利用自我报告的全球推文识别潜在莱姆病病例：通过表情符号增强带有情感词汇的深度学习模型。

J Med Internet Res. 2023 Oct 16;25:e47014. doi: 10.2196/47014.

ANTi-Vax: a novel Twitter dataset for COVID-19 vaccine misinformation detection.抗疫苗：用于 COVID-19 疫苗错误信息检测的新型 Twitter 数据集。

Public Health. 2022 Feb;203:23-30. doi: 10.1016/j.puhe.2021.11.022. Epub 2021 Dec 7.

Towards COVID-19 fake news detection using transformer-based models.利用基于Transformer的模型进行新冠疫情虚假新闻检测

Knowl Based Syst. 2023 Aug 15;274:110642. doi: 10.1016/j.knosys.2023.110642. Epub 2023 May 19.

Misinformation and Public Health Messaging in the Early Stages of the Mpox Outbreak: Mapping the Twitter Narrative With Deep Learning.猴痘疫情早期的错误信息和公共卫生信息传播：用深度学习绘制 Twitter 叙事图

J Med Internet Res. 2023 Jun 6;25:e43841. doi: 10.2196/43841.

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.对新冠疫情和流感流行进行社交媒体监测，并针对阿拉伯语推特数据中的非正式语言进行调整：定性研究。

JMIR Med Inform. 2021 Sep 17;9(9):e27670. doi: 10.2196/27670.

COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets.新冠疫情：一种用于检测信息丰富推文的集成预训练深度学习模型。

Appl Soft Comput. 2021 Aug;107:107495. doi: 10.1016/j.asoc.2021.107495. Epub 2021 May 21.

Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model.用于损害分类的推文：使用微调BERT模型进行基础设施和人员损害评估

PeerJ Comput Sci. 2024 Feb 16;10:e1859. doi: 10.7717/peerj-cs.1859. eCollection 2024.

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.用于追踪 COVID-19 的 Twitter：自然语言处理管道和探索性数据集。

J Med Internet Res. 2021 Jan 22;23(1):e25314. doi: 10.2196/25314.

引用本文的文献

Mapping automatic social media information disorder. The role of bots and AI in spreading misleading information in society.自动社交媒体信息混乱的映射。机器人和人工智能在社会传播误导性信息中的作用。

PLoS One. 2024 May 31;19(5):e0303183. doi: 10.1371/journal.pone.0303183. eCollection 2024.

Integration of the Natural Language Processing of Structural Information Simplified Molecular-Input Line-Entry System Can Improve the In Vitro Prediction of Human Skin Sensitizers.结构信息简化分子输入线性输入系统的自然语言处理整合可改善对人类皮肤致敏剂的体外预测。

Toxics. 2024 Feb 16;12(2):153. doi: 10.3390/toxics12020153.

The Early Detection of Fraudulent COVID-19 Products From Twitter Chatter: Data Set and Baseline Approach Using Anomaly Detection.从推特聊天中早期检测新冠欺诈产品：使用异常检测的数据集和基线方法

JMIR Infodemiology. 2023 Mar 14;3:e43694. doi: 10.2196/43694. eCollection 2023.

本文引用的文献

Combating the infodemic: COVID-19 induced fake news recognition in social media networks.应对信息疫情：社交媒体网络中新冠疫情引发的虚假新闻识别

Complex Intell Systems. 2023;9(3):2879-2891. doi: 10.1007/s40747-022-00672-2. Epub 2022 Feb 18.

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification.基于微调 BERT 的迁移学习方法在文本分类中的应用。

J Healthc Eng. 2022 Jan 7;2022:3498123. doi: 10.1155/2022/3498123. eCollection 2022.

The Impact of Media on Public Health Awareness Concerning the Use of Natural Remedies Against the COVID-19 Outbreak in Saudi Arabia.媒体对沙特阿拉伯公众关于使用天然药物对抗新冠疫情的健康意识的影响。

Int J Gen Med. 2021 Jul 2;14:3145-3152. doi: 10.2147/IJGM.S317348. eCollection 2021.

Fighting the 'Infodemic': Legal Responses to COVID-19 Disinformation.抗击“信息疫情”：针对新冠疫情虚假信息的法律应对措施

Soc Media Soc. 2020 Jul 30;6(3):2056305120948190. doi: 10.1177/2056305120948190. eCollection 2020 Jul.

The Use of Social Media in Detecting Drug Safety-Related New Black Box Warnings, Labeling Changes, or Withdrawals: Scoping Review.社交媒体在发现药物安全性相关新黑框警告、标签变化或撤市中的应用：范围综述。

JMIR Public Health Surveill. 2021 Jun 28;7(6):e30137. doi: 10.2196/30137.

A COVID-19 Rumor Dataset.一个新冠肺炎谣言数据集。

Front Psychol. 2021 May 31;12:644801. doi: 10.3389/fpsyg.2021.644801. eCollection 2021.

Combat COVID-19 infodemic using explainable natural language processing models.使用可解释的自然语言处理模型应对新冠疫情信息疫情。

Inf Process Manag. 2021 Jul;58(4):102569. doi: 10.1016/j.ipm.2021.102569. Epub 2021 Mar 6.

Public Knowledge, Attitudes, and Practices Related to COVID-19 in Iran: Questionnaire Study.公众对伊朗 COVID-19 的知识、态度和实践：问卷调查研究。

JMIR Public Health Surveill. 2021 Feb 23;7(2):e21415. doi: 10.2196/21415.

The role of social media in spreading panic among primary and secondary school students during the COVID-19 pandemic: An online questionnaire study from the Gaza Strip, Palestine.社交媒体在新冠疫情期间在中小学学生中传播恐慌方面的作用：来自巴勒斯坦加沙地带的一项在线问卷调查研究

Heliyon. 2020 Dec 21;6(12):e05807. doi: 10.1016/j.heliyon.2020.e05807. eCollection 2020 Dec.

Myth Busters: Dietary Supplements and COVID-19.破除迷思：膳食补充剂与 COVID-19

Ann Pharmacother. 2020 Aug;54(8):820-826. doi: 10.1177/1060028020928052. Epub 2020 May 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

微调 BERT 模型以在 Twitter 上对大蒜和 COVID-19 相关的错误信息进行分类。

Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献