Suppr超能文献

构建健康领域假新闻检测框架。

Building a framework for fake news detection in the health domain.

机构信息

NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain.

Instituto Mixto de Investigación - Escuela Nacional de Sanidad (IMIENS), Madrid, Spain.

出版信息

PLoS One. 2024 Jul 8;19(7):e0305362. doi: 10.1371/journal.pone.0305362. eCollection 2024.

Abstract

Disinformation in the medical field is a growing problem that carries a significant risk. Therefore, it is crucial to detect and combat it effectively. In this article, we provide three elements to aid in this fight: 1) a new framework that collects health-related articles from verification entities and facilitates their check-worthiness and fact-checking annotation at the sentence level; 2) a corpus generated using this framework, composed of 10335 sentences annotated in these two concepts and grouped into 327 articles, which we call KEANE (faKe nEws At seNtence lEvel); and 3) a new model for verifying fake news that combines specific identifiers of the medical domain with triplets subject-predicate-object, using Transformers and feedforward neural networks at the sentence level. This model predicts the fact-checking of sentences and evaluates the veracity of the entire article. After training this model on our corpus, we achieved remarkable results in the binary classification of sentences (check-worthiness F1: 0.749, fact-checking F1: 0.698) and in the final classification of complete articles (F1: 0.703). We also tested its performance against another public dataset and found that it performed better than most systems evaluated on that dataset. Moreover, the corpus we provide differs from other existing corpora in its duality of sentence-article annotation, which can provide an additional level of justification of the prediction of truth or untruth made by the model.

摘要

医学领域的虚假信息是一个日益严重的问题,存在重大风险。因此,有效地检测和打击虚假信息至关重要。在本文中,我们提供了三个要素来帮助这场斗争:1)一个新的框架,该框架从验证实体中收集与健康相关的文章,并促进对其可查证性和事实核查的句子级注释;2)使用此框架生成的语料库,由 10335 个句子组成,这些句子在这两个概念中进行了注释,并分为 327 篇文章,我们称之为 KEANE(句子级假新闻);3)一种新的验证假新闻的模型,该模型将医学领域的特定标识符与三元组主语-谓语-宾语相结合,在句子级别上使用转换器和前馈神经网络。该模型预测句子的事实核查,并评估整篇文章的真实性。在我们的语料库上训练该模型后,我们在句子的二进制分类(可查证性 F1:0.749,事实核查 F1:0.698)和完整文章的最终分类(F1:0.703)中取得了显著的结果。我们还针对另一个公共数据集测试了它的性能,发现它的性能优于该数据集上评估的大多数系统。此外,我们提供的语料库在句子-文章注释的双重性上与其他现有语料库不同,这可以为模型做出的真假预测提供额外的合理性证明。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ee0/11230534/d93a4274c618/pone.0305362.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验