• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
IRMA: the 335-million-word Italian coRpus for studying MisinformAtion.IRMA:用于研究错误信息的3.35亿字意大利语文本库。
Proc Conf Assoc Comput Linguist Meet. 2023 May;2023:2339-2349. Epub 2023 May 1.
2
LOCO: The 88-million-word language of conspiracy corpus.LOCO:8800万字的阴谋语料库语言。
Behav Res Methods. 2022 Aug;54(4):1794-1817. doi: 10.3758/s13428-021-01698-z. Epub 2021 Oct 25.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study.跨学科方法识别和表征推特上关于新冠疫情的错误信息:混合方法研究
JMIR Form Res. 2023 Jun 28;7:e41134. doi: 10.2196/41134.
5
Identifying Frames of the COVID-19 Infodemic: Thematic Analysis of Misinformation Stories Across Media.识别新冠疫情信息疫情的框架:跨媒体错误信息报道的主题分析
JMIR Infodemiology. 2022 Apr 13;2(1):e33827. doi: 10.2196/33827. eCollection 2022 Jan-Jun.
6
"Thought I'd Share First" and Other Conspiracy Theory Tweets from the COVID-19 Infodemic: Exploratory Study.“我想率先分享”和其他有关 COVID-19 信息疫情的阴谋论推文:探索性研究。
JMIR Public Health Surveill. 2021 Apr 14;7(4):e26527. doi: 10.2196/26527.
7
One Year of COVID-19 Vaccine Misinformation on Twitter: Longitudinal Study.《推特上一年的 COVID-19 疫苗错误信息:纵向研究》。
J Med Internet Res. 2023 Feb 24;25:e42227. doi: 10.2196/42227.
8
Association Between What People Learned About COVID-19 Using Web Searches and Their Behavior Toward Public Health Guidelines: Empirical Infodemiology Study.人们通过网络搜索了解 COVID-19 与他们对公共卫生指南行为之间的关联:实证信息流行病学研究。
J Med Internet Res. 2021 Sep 2;23(9):e28975. doi: 10.2196/28975.
9
Data Exploration and Classification of News Article Reliability: Deep Learning Study.新闻文章可靠性的数据探索与分类:深度学习研究
JMIR Infodemiology. 2022 Sep 22;2(2):e38839. doi: 10.2196/38839. eCollection 2022 Jul-Dec.
10
Exposure to untrustworthy websites in the 2016 US election.2016 年美国大选中的不可信网站曝光。
Nat Hum Behav. 2020 May;4(5):472-480. doi: 10.1038/s41562-020-0833-x. Epub 2020 Mar 2.

本文引用的文献

1
High level of correspondence across different news domain quality rating sets.不同新闻领域质量评级集之间的高度一致性。
PNAS Nexus. 2023 Sep 2;2(9):pgad286. doi: 10.1093/pnasnexus/pgad286. eCollection 2023 Sep.
2
Debate on online social networks at the time of COVID-19: An Italian case study.新冠疫情期间关于在线社交网络的辩论:一项意大利案例研究。
Online Soc Netw Media. 2021 May;23:100136. doi: 10.1016/j.osnem.2021.100136. Epub 2021 Apr 21.
3
Social media sharing of low-quality news sources by political elites.政治精英在社交媒体上分享低质量新闻来源。
PNAS Nexus. 2022 Sep 22;1(4):pgac186. doi: 10.1093/pnasnexus/pgac186.
4
Interconnectedness and (in)coherence as a signature of conspiracy worldviews.相互联系与(非)连贯性作为阴谋论世界观的一个特征。
Sci Adv. 2022 Oct 28;8(43):eabq3668. doi: 10.1126/sciadv.abq3668. Epub 2022 Oct 26.
5
The supply and demand of news during COVID-19 and assessment of questionable sources production.新冠疫情期间的新闻供需情况和可疑来源新闻产量评估。
Nat Hum Behav. 2022 Aug;6(8):1069-1078. doi: 10.1038/s41562-022-01353-3. Epub 2022 May 23.
6
Political audience diversity and news reliability in algorithmic ranking.算法排名中的政治受众多样性与新闻可靠性
Nat Hum Behav. 2022 Apr;6(4):495-505. doi: 10.1038/s41562-021-01276-5. Epub 2022 Feb 3.
7
LOCO: The 88-million-word language of conspiracy corpus.LOCO:8800万字的阴谋语料库语言。
Behav Res Methods. 2022 Aug;54(4):1794-1817. doi: 10.3758/s13428-021-01698-z. Epub 2021 Oct 25.
8
Flow of online misinformation during the peak of the COVID-19 pandemic in Italy.意大利新冠疫情高峰期的在线错误信息传播情况。
EPJ Data Sci. 2021;10(1):34. doi: 10.1140/epjds/s13688-021-00289-4. Epub 2021 Jul 6.
9
How We Do Things With Words: Analyzing Text as Social and Cultural Data.我们如何用词做事:将文本作为社会和文化数据进行分析。
Front Artif Intell. 2020 Aug 25;3:62. doi: 10.3389/frai.2020.00062. eCollection 2020.
10
Shifting attention to accuracy can reduce misinformation online.将注意力转移到准确性上可以减少网络上的错误信息。
Nature. 2021 Apr;592(7855):590-595. doi: 10.1038/s41586-021-03344-2. Epub 2021 Mar 17.

IRMA:用于研究错误信息的3.35亿字意大利语文本库。

IRMA: the 335-million-word Italian coRpus for studying MisinformAtion.

作者信息

Carrella Fabio, Miani Alessandro, Lewandowsky Stephan

机构信息

School of Psychological Science, University of Bristol.

Institute of Work and Organizational Psychology, University of Neuchâtel.

出版信息

Proc Conf Assoc Comput Linguist Meet. 2023 May;2023:2339-2349. Epub 2023 May 1.

PMID:37997575
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7615326/
Abstract

The dissemination of false information on the internet has received considerable attention over the last decade. Misinformation often spreads faster than mainstream news, thus making manual fact checking inefficient or, at best, labor-intensive. Therefore, there is an increasing need to develop methods for automatic detection of misinformation. Although resources for creating such methods are available in English, other languages are often underrepresented in this effort. With this contribution, we present IRMA, a corpus containing over 600,000 Italian news articles (335+ million tokens) collected from 56 websites classified as 'untrustworthy' by professional factcheckers. The corpus is freely available and comprises a rich set of text- and website-level data, representing a turnkey resource to test hypotheses and develop automatic detection algorithms. It contains texts, titles, and dates (from 2004 to 2022), along with three types of semantic measures (i.e., keywords, topics at three different resolutions, and LIWC lexical features). IRMA also includes domainspecific information such as source type (e.g., political, health, conspiracy, etc.), quality, and higher-level metadata, including several metrics of website incoming traffic that allow to investigate user online behavior. IRMA constitutes the largest corpus of misinformation available today in Italian, making it a valid tool for advancing quantitative research on untrustworthy news detection and ultimately helping limit the spread of misinformation.

摘要

在过去十年中,互联网上虚假信息的传播受到了广泛关注。错误信息的传播速度往往比主流新闻更快,因此人工事实核查效率低下,充其量也只是劳动密集型的。因此,开发自动检测错误信息的方法的需求日益增加。尽管创建此类方法的资源在英语中可用,但在这项工作中,其他语言的资源往往较少。通过这项贡献,我们展示了IRMA,这是一个语料库,包含从56个被专业事实核查人员归类为“不可信”的网站收集的60多万篇意大利新闻文章(3.35亿多个词元)。该语料库可免费获取,包含丰富的文本级和网站级数据,是测试假设和开发自动检测算法的一站式资源。它包含文本、标题和日期(从2004年到2022年),以及三种语义度量(即关键词、三种不同分辨率的主题和LIWC词汇特征)。IRMA还包括特定领域的信息,如来源类型(如政治、健康、阴谋等)、质量和更高级别的元数据,包括几个网站流量指标,可用于调查用户的在线行为。IRMA是目前意大利语中最大的错误信息语料库,使其成为推进不可信新闻检测定量研究并最终帮助限制错误信息传播的有效工具。