Suppr超能文献

利用真实疫情构建测试数据以评估检测算法。

Building test data from real outbreaks for evaluating detection algorithms.

作者信息

Texier Gaetan, Jackson Michael L, Siwe Leonel, Meynard Jean-Baptiste, Deparis Xavier, Chaudet Herve

机构信息

Pasteur Center in Cameroun, Yaoundé, Cameroun.

UMR 912 / SESSTIM - INSERM/IRD/Aix-Marseille University / Faculty of Medicine - 27, Bd Jean Moulin, Marseille, France.

出版信息

PLoS One. 2017 Sep 1;12(9):e0183992. doi: 10.1371/journal.pone.0183992. eCollection 2017.

Abstract

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.

摘要

对监测系统进行基准测试需要对疾病爆发进行逼真的模拟。然而,要获得足够数量、具有逼真形状且涵盖足够范围的病原体、规模和持续时间的数据非常困难。生成的爆发信号数据集应反映监测系统可能面临的真实情况的分布,包括极不可能出现的爆发信号。我们提出并评估了一种基于使用历史爆发数据来模拟定制爆发信号的新方法。该方法依赖于对历史分布进行相似变换,然后进行重采样过程(二项式、逆变换采样方法 - ITSM、梅特罗波利斯 - 黑斯廷斯随机游走、梅特罗波利斯 - 黑斯廷斯独立采样、吉布斯采样器、混合吉布斯采样器)。我们进行了一项分析,以确定模拟质量最重要的输入参数,并评估每种重采样算法的性能。我们的分析证实了所使用算法的类型和模拟参数(即天数、病例数、爆发形状、总体比例因子)对结果的影响。我们表明,无论选择何种爆发情况、算法和评估指标,模拟质量都会随着模拟天数的增加而下降,随着模拟病例数的增加而提高。模拟病例数少于持续天数(即总体比例因子小于 1)的爆发会导致模拟过程中信息的大量损失。我们发现采用收缩程序的吉布斯采样在准确性和数据依赖性之间提供了良好的平衡。如果依赖性不太重要,二项式和 ITSM 方法是准确的。鉴于将模拟保持在监测系统可能面临的合理流行病学曲线范围内的限制,我们的研究证实我们的方法可用于生成大量的爆发信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a9e/5593515/eac05c647762/pone.0183992.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验