Barnett Adrian G, Stephen Dimity, Huang Cunrui, Wolkewitz Martin
Institute of Health and Biomedical Innovation, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, QLD 4059, Australia.
Institute of Health and Biomedical Innovation, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, QLD 4059, Australia.
Environ Res. 2017 Apr;154:222-225. doi: 10.1016/j.envres.2017.01.007. Epub 2017 Jan 17.
Time series data are popular in environmental epidemiology as they make use of the natural experiment of how changes in exposure over time might impact on disease. Many published time series papers have used parameter-heavy models that fully explained the second order patterns in disease to give residuals that have no short-term autocorrelation or seasonality. This is often achieved by including predictors of past disease counts (autoregression) or seasonal splines with many degrees of freedom. These approaches give great residuals, but add little to our understanding of cause and effect. We argue that modelling approaches should rely more on good epidemiology and less on statistical tests. This includes thinking about causal pathways, making potential confounders explicit, fitting a limited number of models, and not over-fitting at the cost of under-estimating the true association between exposure and disease.
时间序列数据在环境流行病学中很受欢迎,因为它们利用了暴露随时间的变化如何可能影响疾病的自然实验。许多已发表的时间序列论文使用了参数繁多的模型,这些模型充分解释了疾病的二阶模式,以得到没有短期自相关或季节性的残差。这通常是通过纳入过去疾病计数的预测因子(自回归)或具有许多自由度的季节性样条来实现的。这些方法产生了很好的残差,但对我们理解因果关系帮助不大。我们认为,建模方法应该更多地依赖良好的流行病学,而减少对统计检验的依赖。这包括思考因果途径、明确潜在的混杂因素、拟合有限数量的模型,以及不以低估暴露与疾病之间的真实关联为代价进行过度拟合。