Turner Simon L, Forbes Andrew B, Karahalios Amalia, Taljaard Monica, McKenzie Joanne E
School of Public Health and Preventive Medicine, Monash University, 533 St Kilda Road, Melbourne, Victoria, Australia.
Clinical Epidemiology Program, Ottawa Hospital Research Institute, Carling Ave, Ottawa, Ontario, 1053, Canada.
BMC Med Res Methodol. 2021 Aug 28;21(1):181. doi: 10.1186/s12874-021-01364-0.
Interrupted time series (ITS) studies are frequently used to evaluate the effects of population-level interventions or exposures. However, examination of the performance of statistical methods for this design has received relatively little attention.
We simulated continuous data to compare the performance of a set of statistical methods under a range of scenarios which included different level and slope changes, varying lengths of series and magnitudes of lag-1 autocorrelation. We also examined the performance of the Durbin-Watson (DW) test for detecting autocorrelation.
All methods yielded unbiased estimates of the level and slope changes over all scenarios. The magnitude of autocorrelation was underestimated by all methods, however, restricted maximum likelihood (REML) yielded the least biased estimates. Underestimation of autocorrelation led to standard errors that were too small and coverage less than the nominal 95%. All methods performed better with longer time series, except for ordinary least squares (OLS) in the presence of autocorrelation and Newey-West for high values of autocorrelation. The DW test for the presence of autocorrelation performed poorly except for long series and large autocorrelation.
From the methods evaluated, OLS was the preferred method in series with fewer than 12 points, while in longer series, REML was preferred. The DW test should not be relied upon to detect autocorrelation, except when the series is long. Care is needed when interpreting results from all methods, given confidence intervals will generally be too narrow. Further research is required to develop better performing methods for ITS, especially for short series.
间断时间序列(ITS)研究常用于评估人群层面干预措施或暴露因素的效果。然而,针对该设计的统计方法性能研究相对较少受到关注。
我们模拟连续数据,以比较一系列场景下一组统计方法的性能,这些场景包括不同的水平和斜率变化、不同的序列长度以及滞后1自相关的大小。我们还检验了用于检测自相关的德宾-沃森(DW)检验的性能。
在所有场景下,所有方法对水平和斜率变化的估计均无偏。然而,所有方法均低估了自相关的大小,不过,限制最大似然法(REML)产生的偏差估计最小。自相关的低估导致标准误差过小,置信区间覆盖范围小于名义上的95%。除了存在自相关时的普通最小二乘法(OLS)以及自相关值较高时的纽威-韦斯特法(Newey-West)外,所有方法在时间序列较长时表现更好。用于检测自相关的DW检验,除了序列较长且自相关较大的情况外,表现不佳。
在所评估的方法中,对于点数少于12个的序列,OLS是首选方法,而对于较长序列,REML是首选方法。不应依赖DW检验来检测自相关,除非序列较长。鉴于置信区间通常会过窄,在解释所有方法的结果时都需要谨慎。需要进一步开展研究,以开发性能更佳的ITS方法,尤其是针对短序列的方法。