FitzJohn Richard G, Knock Edward S, Whittles Lilith K, Perez-Guzman Pablo N, Bhatia Sangeeta, Guntoro Fernando, Watson Oliver J, Whittaker Charles, Ferguson Neil M, Cori Anne, Baguelin Marc, Lees John A
MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics (J-IDEA), School of Public Health, Imperial College London, London, W2 1PG, UK.
Modelling and Economics Unit, National Infection Service, Public Health England, London, UK.
Wellcome Open Res. 2021 Jun 10;5:288. doi: 10.12688/wellcomeopenres.16466.2. eCollection 2020.
State space models, including compartmental models, are used to model physical, biological and social phenomena in a broad range of scientific fields. A common way of representing the underlying processes in these models is as a system of stochastic processes which can be simulated forwards in time. Inference of model parameters based on observed time-series data can then be performed using sequential Monte Carlo techniques. However, using these methods for routine inference problems can be made difficult due to various engineering considerations: allowing model design to change in response to new data and ideas, writing model code which is highly performant, and incorporating all of this with up-to-date statistical techniques. Here, we describe a suite of packages in the R programming language designed to streamline the design and deployment of state space models, targeted at infectious disease modellers but suitable for other domains. Users describe their model in a familiar domain-specific language, which is converted into parallelised C++ code. A fast, parallel, reproducible random number generator is then used to run large numbers of model simulations in an efficient manner. We also provide standard inference and prediction routines, though the model simulator can be used directly if these do not meet the user's needs. These packages provide guarantees on reproducibility and performance, allowing the user to focus on the model itself, rather than the underlying computation. The ability to automatically generate high-performance code that would be tedious and time-consuming to write and verify manually, particularly when adding further structure to compartments, is crucial for infectious disease modellers. Our packages have been critical to the development cycle of our ongoing real-time modelling efforts in the COVID-19 pandemic, and have the potential to do the same for models used in a number of different domains.
状态空间模型,包括 compartmental 模型,被用于在广泛的科学领域中对物理、生物和社会现象进行建模。在这些模型中表示潜在过程的一种常见方式是作为一个随机过程系统,它可以随时间向前模拟。然后可以使用序贯蒙特卡罗技术基于观测到的时间序列数据对模型参数进行推断。然而,由于各种工程方面的考虑,将这些方法用于常规推断问题可能会变得困难:允许模型设计根据新数据和新想法进行更改,编写高性能的模型代码,并将所有这些与最新的统计技术相结合。在这里,我们描述了一套用 R 编程语言编写的软件包,旨在简化状态空间模型的设计和部署,目标是传染病建模人员,但也适用于其他领域。用户用一种熟悉的特定领域语言描述他们的模型,该语言会被转换为并行化的 C++ 代码。然后使用一个快速、并行、可重现的随机数生成器以高效的方式运行大量的模型模拟。我们还提供标准的推断和预测例程,不过如果这些不能满足用户需求,也可以直接使用模型模拟器。这些软件包提供了关于可重复性和性能的保证,让用户能够专注于模型本身,而不是底层的计算。能够自动生成高性能代码,而手动编写和验证这些代码会很繁琐且耗时,特别是在向 compartment 添加更多结构时,这对传染病建模人员至关重要。我们的软件包对于我们在 COVID-19 大流行期间正在进行的实时建模工作的开发周期至关重要,并且有可能对许多不同领域中使用的模型起到同样的作用。