新闻文章可靠性的数据探索与分类：深度学习研究

Data Exploration and Classification of News Article Reliability: Deep Learning Study.

作者信息

Zhan Kevin, Li Yutong, Osmani Rafay, Wang Xiaoyu, Cao Bo

机构信息

Department of Psychiatry University of Alberta Edmonton, AB Canada.

Department of Cell Biology University of Alberta Edmonton, AB Canada.

出版信息

JMIR Infodemiology. 2022 Sep 22;2(2):e38839. doi: 10.2196/38839. eCollection 2022 Jul-Dec.

DOI:10.2196/38839

PMID:36193330

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9516811/

Abstract

BACKGROUND

During the ongoing COVID-19 pandemic, we are being exposed to large amounts of information each day. This "infodemic" is defined by the World Health Organization as the mass spread of misleading or false information during a pandemic. This spread of misinformation during the infodemic ultimately leads to misunderstandings of public health orders or direct opposition against public policies. Although there have been efforts to combat misinformation spread, current manual fact-checking methods are insufficient to combat the infodemic.

OBJECTIVE

We propose the use of natural language processing (NLP) and machine learning (ML) techniques to build a model that can be used to identify unreliable news articles online.

METHODS

First, we preprocessed the ReCOVery data set to obtain 2029 English news articles tagged with COVID-19 keywords from January to May 2020, which are labeled as reliable or unreliable. Data exploration was conducted to determine major differences between reliable and unreliable articles. We built an ensemble deep learning model using the body text, as well as features, such as sentiment, Empath-derived lexical categories, and readability, to classify the reliability.

RESULTS

We found that reliable news articles have a higher proportion of neutral sentiment, while unreliable articles have a higher proportion of negative sentiment. Additionally, our analysis demonstrated that reliable articles are easier to read than unreliable articles, in addition to having different lexical categories and keywords. Our new model was evaluated to achieve the following performance metrics: 0.906 area under the curve (AUC), 0.835 specificity, and 0.945 sensitivity. These values are above the baseline performance of the original ReCOVery model.

CONCLUSIONS

This paper identified novel differences between reliable and unreliable news articles; moreover, the model was trained using state-of-the-art deep learning techniques. We aim to be able to use our findings to help researchers and the public audience more easily identify false information and unreliable media in their everyday lives.

摘要

背景

在持续的新冠疫情期间，我们每天都接触到大量信息。世界卫生组织将这种“信息疫情”定义为在疫情期间误导性或虚假信息的大量传播。信息疫情期间错误信息的传播最终导致对公共卫生指令的误解或对公共政策的直接反对。尽管已经努力打击错误信息的传播，但目前的人工事实核查方法不足以应对信息疫情。

目的

我们建议使用自然语言处理（NLP）和机器学习（ML）技术来构建一个模型，该模型可用于识别在线不可靠新闻文章。

方法

首先，我们对ReCOVery数据集进行预处理，以获取2020年1月至5月标记有新冠关键词的2029篇英文新闻文章，这些文章被标记为可靠或不可靠。进行数据探索以确定可靠和不可靠文章之间的主要差异。我们使用正文以及情感、共情衍生词汇类别和可读性等特征构建了一个集成深度学习模型，以对可靠性进行分类。

结果

我们发现可靠新闻文章的中性情感比例更高，而不可靠文章的负面情感比例更高。此外，我们的分析表明，可靠文章除了具有不同的词汇类别和关键词外，比不可靠文章更易读。我们的新模型经评估实现了以下性能指标：曲线下面积（AUC）为0.906、特异性为0.835、灵敏度为0.945。这些值高于原始ReCOVery模型的基线性能。

结论

本文确定了可靠和不可靠新闻文章之间的新差异；此外该模型是使用最先进的深度学习技术进行训练的。我们的目标是能够利用我们的研究结果帮助研究人员和公众在日常生活中更轻松地识别虚假信息和不可靠媒体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dc6/10117316/54217b084027/infodemiology_v2i2e38839_fig1.jpg

相似文献

Data Exploration and Classification of News Article Reliability: Deep Learning Study.新闻文章可靠性的数据探索与分类：深度学习研究

JMIR Infodemiology. 2022 Sep 22;2(2):e38839. doi: 10.2196/38839. eCollection 2022 Jul-Dec.

COVID-19 Misinformation Detection: Machine-Learned Solutions to the Infodemic.新冠疫情错误信息检测：针对信息疫情的机器学习解决方案

JMIR Infodemiology. 2022 Aug 25;2(2):e38756. doi: 10.2196/38756. eCollection 2022 Jul-Dec.

"Thought I'd Share First" and Other Conspiracy Theory Tweets from the COVID-19 Infodemic: Exploratory Study.“我想率先分享”和其他有关 COVID-19 信息疫情的阴谋论推文：探索性研究。

JMIR Public Health Surveill. 2021 Apr 14;7(4):e26527. doi: 10.2196/26527.

Using Machine Learning Technology (Early Artificial Intelligence-Supported Response With Social Listening Platform) to Enhance Digital Social Understanding for the COVID-19 Infodemic: Development and Implementation Study.利用机器学习技术（借助社交倾听平台实现早期人工智能支持的响应）增强对 COVID-19 信息疫情的数字社会理解：开发与实施研究。

JMIR Infodemiology. 2023 Aug 21;3:e47317. doi: 10.2196/47317.

Using the COVID-19 Pandemic to Assess the Influence of News Affect on Online Mental Health-Related Search Behavior Across the United States: Integrated Sentiment Analysis and the Circumplex Model of Affect.利用 COVID-19 大流行评估新闻情绪对美国在线心理健康相关搜索行为的影响：综合情绪分析和情绪的双因素模型。

J Med Internet Res. 2022 Jan 27;24(1):e32731. doi: 10.2196/32731.

News Sentiment Informed Time-series Analyzing AI (SITALA) to curb the spread of COVID-19 in Houston.新闻情绪驱动的时间序列分析人工智能（SITALA）用于遏制休斯顿新冠病毒的传播。

Expert Syst Appl. 2021 Oct 15;180:115104. doi: 10.1016/j.eswa.2021.115104. Epub 2021 Apr 29.

Combat COVID-19 infodemic using explainable natural language processing models.使用可解释的自然语言处理模型应对新冠疫情信息疫情。

Inf Process Manag. 2021 Jul;58(4):102569. doi: 10.1016/j.ipm.2021.102569. Epub 2021 Mar 6.

Understanding COVID-19 Impacts on the Health Workforce: AI-Assisted Open-Source Media Content Analysis.了解新冠疫情对卫生人力的影响：人工智能辅助的开源媒体内容分析

JMIR Form Res. 2024 Jun 13;8:e53574. doi: 10.2196/53574.

Fake news in the age of COVID-19: evolutional and psychobiological considerations.新冠疫情时代的假新闻：进化和心理生物学方面的考虑。

Psychiatriki. 2022 Sep 19;33(3):183-186. doi: 10.22365/jpsych.2022.087. Epub 2022 Jul 19.

Detecting Misleading Information on COVID-19.检测关于新冠病毒的误导性信息。

IEEE Access. 2020 Sep 9;8:165201-165215. doi: 10.1109/ACCESS.2020.3022867. eCollection 2020.

本文引用的文献

ARCNN framework for multimodal infodemic detection.基于 ARCNN 的多模态信息疫情检测框架。

Neural Netw. 2022 Feb;146:36-68. doi: 10.1016/j.neunet.2021.11.006. Epub 2021 Nov 13.

CoVerifi: A COVID-19 news verification system.CoVerifi：一个新冠疫情新闻核实系统。

Online Soc Netw Media. 2021 Mar;22:100123. doi: 10.1016/j.osnem.2021.100123. Epub 2021 Jan 23.

COVID-19 misinformation: Accuracy of articles about coronavirus prevention mostly shared on social media.新冠疫情不实信息：社交媒体上大量分享的有关冠状病毒预防的文章的准确性。

Health Policy Technol. 2021 Mar;10(1):182-186. doi: 10.1016/j.hlpt.2020.10.007. Epub 2020 Nov 1.

Where We Go From Here: Health Misinformation on Social Media.我们从何而来：社交媒体上的健康错误信息。

Am J Public Health. 2020 Oct;110(S3):S273-S275. doi: 10.2105/AJPH.2020.305905.

How to Fight an Infodemic: The Four Pillars of Infodemic Management.如何应对信息疫情：信息疫情管理的四大支柱

J Med Internet Res. 2020 Jun 29;22(6):e21820. doi: 10.2196/21820.

Fake News or Weak Science? Visibility and Characterization of Antivaccine Webpages Returned by Google in Different Languages and Countries.假新闻还是弱科学？用不同语言和国家的谷歌搜索返回的反疫苗网页的可见性和特征。

Front Immunol. 2018 Jun 5;9:1215. doi: 10.3389/fimmu.2018.01215. eCollection 2018.

The science of fake news.假新闻的科学。

Science. 2018 Mar 9;359(6380):1094-1096. doi: 10.1126/science.aao2998. Epub 2018 Mar 8.

How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant's Age.我们认识多少单词？基于单词定义、语言输入程度和参与者年龄的词汇量实际估算

Front Psychol. 2016 Jul 29;7:1116. doi: 10.3389/fpsyg.2016.01116. eCollection 2016.

Long short-term memory.长短期记忆

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

新闻文章可靠性的数据探索与分类：深度学习研究

Data Exploration and Classification of News Article Reliability: Deep Learning Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献