Bloom Ronald M, Buckeridge David L, Cheng Karen E
McGill Clinical and Health Informatics, Department of Epidemiology and Biostatistics, McGill University, 1140 Pine Avenue West, Montreal, Quebec H3A 1A3.
J Am Med Inform Assoc. 2007 Jan-Feb;14(1):76-85. doi: 10.1197/jamia.M2178. Epub 2006 Oct 26.
Bioterrorism and emerging infectious diseases such as influenza have spurred research into rapid outbreak detection. One primary thrust of this research has been to identify data sources that provide early indication of a disease outbreak by being leading indicators relative to other established data sources. Researchers tend to rely on the sample cross-correlation function (CCF) to quantify the association between two data sources. There has been, however, little consideration by medical informatics researchers of the influence of methodological choices on the ability of the CCF to identify a lead-lag relationship between time series. We draw on experience from the econometric and environmental health communities, and we use simulation to demonstrate that the sample CCF is highly prone to bias. Specifically, long-scale phenomena tend to overwhelm the CCF, obscuring phenomena at shorter wave lengths. Researchers seeking lead-lag relationships in surveillance data must therefore stipulate the scale length of the features of interest (e.g., short-scale spikes versus long-scale seasonal fluctuations) and then filter the data appropriately--to diminish the influence of other features, which may mask the features of interest. Otherwise, conclusions drawn from the sample CCF of bi-variate time-series data will inevitably be ambiguous and often altogether misleading.
生物恐怖主义以及流感等新出现的传染病推动了对快速疫情检测的研究。这项研究的一个主要方向是确定那些相对于其他既定数据源而言能够作为领先指标,从而提供疾病爆发早期迹象的数据来源。研究人员倾向于依靠样本互相关函数(CCF)来量化两个数据源之间的关联。然而,医学信息学研究人员很少考虑方法选择对CCF识别时间序列之间领先-滞后关系能力的影响。我们借鉴了计量经济学和环境卫生领域的经验,并通过模拟证明样本CCF极易产生偏差。具体而言,长期现象往往会使CCF不堪重负,掩盖较短波长的现象。因此,在监测数据中寻找领先-滞后关系的研究人员必须规定感兴趣特征的尺度长度(例如,短尺度峰值与长尺度季节性波动),然后对数据进行适当过滤,以减少其他可能掩盖感兴趣特征的特征的影响。否则,从双变量时间序列数据的样本CCF得出的结论将不可避免地模棱两可,而且常常会产生误导。