Suppr超能文献

使用可解释的自然语言处理模型应对新冠疫情信息疫情。

Combat COVID-19 infodemic using explainable natural language processing models.

作者信息

Ayoub Jackie, Yang X Jessie, Zhou Feng

机构信息

Industrial and Manufacturing Systems Engineering, University of Michigan-Dearborn, 4901 Evergreen Road, Dearborn, MI 48128, United States of America.

Industrial and Operations Engineering, University of Michigan, 1205 Beal Avenue, Ann Arbor, MI 48015, United States of America.

出版信息

Inf Process Manag. 2021 Jul;58(4):102569. doi: 10.1016/j.ipm.2021.102569. Epub 2021 Mar 6.

Abstract

Misinformation of COVID-19 is prevalent on social media as the pandemic unfolds, and the associated risks are extremely high. Thus, it is critical to detect and combat such misinformation. Recently, deep learning models using natural language processing techniques, such as BERT (Bidirectional Encoder Representations from Transformers), have achieved great successes in detecting misinformation. In this paper, we proposed an explainable natural language processing model based on DistilBERT and SHAP (Shapley Additive exPlanations) to combat misinformation about COVID-19 due to their efficiency and effectiveness. First, we collected a dataset of 984 claims about COVID-19 with fact-checking. By augmenting the data using back-translation, we doubled the sample size of the dataset and the DistilBERT model was able to obtain good performance (accuracy: 0.972; areas under the curve: 0.993) in detecting misinformation about COVID-19. Our model was also tested on a larger dataset for AAAI2021 - COVID-19 Fake News Detection Shared Task and obtained good performance (accuracy: 0.938; areas under the curve: 0.985). The performance on both datasets was better than traditional machine learning models. Second, in order to boost public trust in model prediction, we employed SHAP to improve model explainability, which was further evaluated using a between-subjects experiment with three conditions, i.e., text (T), text+SHAP explanation (TSE), and text+SHAP explanation+source and evidence (TSESE). The participants were significantly more likely to trust and share information related to COVID-19 in the TSE and TSESE conditions than in the T condition. Our results provided good implications for detecting misinformation about COVID-19 and improving public trust.

摘要

随着新冠疫情的发展,社交媒体上关于新冠病毒的错误信息盛行,且相关风险极高。因此,检测并对抗此类错误信息至关重要。最近,使用自然语言处理技术的深度学习模型,如BERT(来自Transformer的双向编码器表示),在检测错误信息方面取得了巨大成功。在本文中,我们提出了一种基于DistilBERT和SHAP(Shapley值加法解释)的可解释自然语言处理模型,以对抗关于新冠病毒的错误信息,因为它们具有高效性和有效性。首先,我们收集了一个包含984条关于新冠病毒且经过事实核查的声明的数据集。通过反向翻译来扩充数据,我们将数据集的样本量翻倍,并且DistilBERT模型在检测关于新冠病毒的错误信息方面能够获得良好的性能(准确率:0.972;曲线下面积:0.993)。我们的模型还在一个更大的用于AAAI2021 - 新冠病毒假新闻检测共享任务的数据集上进行了测试,并取得了良好的性能(准确率:0.938;曲线下面积:0.985)。这两个数据集上的性能均优于传统机器学习模型。其次,为了增强公众对模型预测的信任,我们采用SHAP来提高模型的可解释性,并通过一个包含三种条件的被试间实验进行了进一步评估,这三种条件分别是文本(T)、文本 + SHAP解释(TSE)以及文本 + SHAP解释 + 来源和证据(TSESE)。与T条件相比,参与者在TSE和TSESE条件下显著更有可能信任并分享与新冠病毒相关的信息。我们的结果为检测关于新冠病毒的错误信息以及提高公众信任提供了良好的启示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01b8/7980090/a7fef0b2b43c/gr1_lrg.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验