School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, PL48AA, UK.
Sci Rep. 2023 Apr 15;13(1):6170. doi: 10.1038/s41598-023-33141-y.
The Covid-19 pandemic presents a serious threat to people's health, resulting in over 250 million confirmed cases and over 5 million deaths globally. To reduce the burden on national health care systems and to mitigate the effects of the outbreak, accurate modelling and forecasting methods for short- and long-term health demand are needed to inform government interventions aiming at curbing the pandemic. Current research on Covid-19 is typically based on a single source of information, specifically on structured historical pandemic data. Other studies are exclusively focused on unstructured online retrieved insights, such as data available from social media. However, the combined use of structured and unstructured information is still uncharted. This paper aims at filling this gap, by leveraging historical and social media information with a novel data integration methodology. The proposed approach is based on vine copulas, which allow us to exploit the dependencies between different sources of information. We apply the methodology to combine structured datasets retrieved from official sources and a big unstructured dataset of information collected from social media. The results show that the combined use of official and online generated information contributes to yield a more accurate assessment of the evolution of the Covid-19 pandemic, compared to the sole use of official data.
Covid-19 大流行对人们的健康构成了严重威胁,在全球范围内导致了超过 2.5 亿例确诊病例和超过 500 万人死亡。为了减轻国家卫生保健系统的负担,并减轻疫情的影响,需要使用准确的短期和长期卫生需求建模和预测方法,为旨在遏制大流行的政府干预措施提供信息。目前针对 Covid-19 的研究通常基于单一信息来源,特别是结构化的历史大流行数据。其他研究则专门侧重于非结构化的在线检索见解,例如社交媒体上提供的数据。但是,结构化和非结构化信息的结合使用仍未被探索。本文旨在通过利用历史和社交媒体信息并结合新颖的数据集成方法来填补这一空白。该方法基于藤蔓 Copula,这使我们能够利用不同信息源之间的依赖关系。我们应用该方法将从官方来源检索的结构化数据集和从社交媒体收集的大型非结构化信息数据集结合在一起。结果表明,与仅使用官方数据相比,官方和在线生成信息的结合使用有助于更准确地评估 Covid-19 大流行的演变。