基于高斯过程堆叠泛化的疾病风险制图预测精度改进。
Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization.
机构信息
Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, UK
Oxford Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK.
出版信息
J R Soc Interface. 2017 Sep;14(134). doi: 10.1098/rsif.2017.0520.
Maps of infectious disease-charting spatial variations in the force of infection, degree of endemicity and the burden on human health-provide an essential evidence base to support planning towards global health targets. Contemporary disease mapping efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial interpolation. The most common such approach is Gaussian process regression, a mathematical framework composed of two components: a mean function harnessing the predictive power of multiple independent variables, and a covariance function yielding spatio-temporal shrinkage against residual variation from the mean. Though many techniques have been developed to improve the flexibility and fitting of the covariance function, models for the mean function have typically been restricted to simple linear terms. For infectious diseases, known to be driven by complex interactions between environmental and socio-economic factors, improved modelling of the mean function can greatly boost predictive power. Here, we present an ensemble approach based on stacked generalization that allows for multiple nonlinear algorithmic mean functions to be jointly embedded within the Gaussian process framework. We apply this method to mapping prevalence data in sub-Saharan Africa and show that the generalized ensemble approach markedly outperforms any individual method.
疾病地图为支持全球卫生目标规划提供了基本的证据基础,绘制了传染病的空间变化图,包括感染力度、地方性程度和对人类健康的负担。当代疾病制图工作采用了统计建模方法,以正确认识到现有测量值及其空间插值的不确定性。最常见的方法是高斯过程回归,这是一个由两个组成部分组成的数学框架:一个均值函数利用多个独立变量的预测能力,以及一个协方差函数,根据均值的剩余变化产生时空收缩。尽管已经开发了许多技术来提高协方差函数的灵活性和拟合度,但均值函数的模型通常仅限于简单的线性项。对于传染病,已知是由环境和社会经济因素之间的复杂相互作用驱动的,因此,对均值函数的改进建模可以大大提高预测能力。在这里,我们提出了一种基于堆叠泛化的集成方法,允许将多个非线性算法均值函数联合嵌入到高斯过程框架中。我们将该方法应用于撒哈拉以南非洲地区的患病率数据映射,并表明广义集成方法明显优于任何单个方法。
相似文献
J R Soc Interface. 2017-9
Parasitol Today. 1999-3
PLoS Med. 2009-3-24
Trop Med Int Health. 2005-6
引用本文的文献
medRxiv. 2025-1-8
Malar J. 2023-11-21
Biophys J. 2023-2-7
本文引用的文献
Proc Math Phys Eng Sci. 2015-7-8
PLoS Negl Trop Dis. 2015-6-10
ISPRS J Photogramm Remote Sens. 2014-12
Nature. 2013-4-7
Philos Trans R Soc Lond B Biol Sci. 2013-2-4