Suppr超能文献

CoViNAR:一个用于大流行严重程度预测与分析的情境感知社交媒体数据集。

CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis.

作者信息

Shafiya Soofi, Wani Mudasir Ahmad, Jabin Suraiya, ELAffendi Mohammad

机构信息

Department of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, India.

EIAS Data Science & Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia.

出版信息

Front Artif Intell. 2025 Aug 20;8:1623090. doi: 10.3389/frai.2025.1623090. eCollection 2025.

Abstract

INTRODUCTION

The unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.

METHODS

Using SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as "Need", "Availability", and "Not-relevant". The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.

RESULTS

The best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between "Need/Availability" tweet counts and COVID-19 case surges.

DISCUSSION

The results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics.

摘要

引言

史无前例的新冠疫情暴露了全球卫生管理中的关键弱点,尤其是在资源分配和需求预测方面。本研究旨在通过利用实时社交媒体分析来检测和监测资源需求,以加强大流行防范能力。

方法

使用SnScrape,通过与新冠疫情相关的主题标签,收集了2019年11月至2023年3月期间超过2750万条推文。选取了2021年4月这一大流行高峰期的推文来创建CoViNAR数据集。BERTopic实现了上下文感知过滤,从而得到了一个包含14000条带注释推文的新数据集,这些推文被分类为“需求”、“可用性”和“不相关”。CoViNAR数据集用于训练各种机器学习分类器,并使用三种上下文感知词嵌入技术进行实验。

结果

使用DistilBERT嵌入训练的最佳分类器在测试数据集上的准确率为96.42%,精确率为96.44%,召回率为96.42%,F1分数为96.43%。对2019年11月至2023年3月期间来自美国、英国和印度的分类推文进行的时间分析显示,“需求/可用性”推文数量与新冠病例激增之间存在很强的相关性。

讨论

结果表明了所提出方法在捕捉资源短缺和可用性实时指标方面的有效性。与病例激增的强相关性突出了其作为公共卫生当局主动工具的潜力,能够在大流行期间改善资源分配并进行早期危机干预。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f136/12405228/01314e5279fe/frai-08-1623090-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验