Suppr超能文献

公共卫生突发事件中的数据管道:机器中的人。

Data pipelines in a public health emergency: The human in the machine.

机构信息

School of Public Health, Imperial College London, Praed Street, London, United Kingdom.

School of Public Health, Imperial College London, Praed Street, London, United Kingdom.

出版信息

Epidemics. 2023 Jun;43:100676. doi: 10.1016/j.epidem.2023.100676. Epub 2023 Mar 8.

Abstract

In an emergency epidemic response, data providers supply data on a best-faith effort to modellers and analysts who are typically the end user of data collected for other primary purposes such as to inform patient care. Thus, modellers who analyse secondary data have limited ability to influence what is captured. During an emergency response, models themselves are often under constant development and require both stability in their data inputs and flexibility to incorporate new inputs as novel data sources become available. This dynamic landscape is challenging to work with. Here we outline a data pipeline used in the ongoing COVID-19 response in the UK that aims to address these issues. A data pipeline is a sequence of steps to carry the raw data through to a processed and useable model input, along with the appropriate metadata and context. In ours, each data type had an individual processing report, designed to produce outputs that could be easily combined and used downstream. Automated checks were in-built and added as new pathologies emerged. These cleaned outputs were collated at different geographic levels to provide standardised datasets. Finally, a human validation step was an essential component of the analysis pathway and permitted more nuanced issues to be captured. This framework allowed the pipeline to grow in complexity and volume and facilitated the diverse range of modelling approaches employed by researchers. Additionally, every report or modelling output could be traced back to the specific data version that informed it ensuring reproducibility of results. Our approach has been used to facilitate fast-paced analysis and has evolved over time. Our framework and its aspirations are applicable to many settings beyond COVID-19 data, for example for other outbreaks such as Ebola, or where routine and regular analyses are required.

摘要

在紧急疫情应对中,数据提供者根据诚信尽力原则向建模者和分析人员提供数据,而建模者和分析人员通常是为了向患者护理提供信息等其他主要目的而收集数据的最终用户。因此,分析二次数据的建模者对数据的采集影响有限。在紧急应对期间,模型本身通常在不断发展,需要数据输入的稳定性和灵活性,以便在新的数据源可用时纳入新的输入。这种动态的环境具有挑战性。在这里,我们概述了英国目前正在进行的 COVID-19 应对中使用的数据管道,旨在解决这些问题。数据管道是一个将原始数据通过一系列步骤处理为可用于模型输入的过程,同时还包括适当的元数据和上下文。在我们的管道中,每种数据类型都有一个单独的处理报告,旨在生成可以轻松组合和下游使用的输出。随着新的病理出现,自动检查被内置并添加。这些经过清理的输出在不同的地理级别进行整理,以提供标准化的数据集。最后,人工验证步骤是分析途径的重要组成部分,允许捕获更细微的问题。这个框架允许管道在复杂性和规模上不断发展,并促进了研究人员使用的各种建模方法。此外,每个报告或建模输出都可以追溯到提供其信息的特定数据版本,以确保结果的可重复性。我们的方法已被用于促进快速分析,并随着时间的推移不断发展。我们的框架及其目标适用于 COVID-19 数据以外的许多场景,例如埃博拉等其他疫情,或者需要进行常规和定期分析的情况。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验