Department of Environmental Health, Rollins School of Public Health, Emory University, 1518 Clifton Road, NE, Atlanta, GA 30322, USA.
Environ Health. 2012 Sep 20;11:68. doi: 10.1186/1476-069X-11-68.
Estimation of power to assess associations of interest can be challenging for time-series studies of the acute health effects of air pollution because there are two dimensions of sample size (time-series length and daily outcome counts), and because these studies often use generalized linear models to control for complex patterns of covariation between pollutants and time trends, meteorology and possibly other pollutants. In general, statistical software packages for power estimation rely on simplifying assumptions that may not adequately capture this complexity. Here we examine the impact of various factors affecting power using simulations, with comparison of power estimates obtained from simulations with those obtained using statistical software.
Power was estimated for various analyses within a time-series study of air pollution and emergency department visits using simulations for specified scenarios. Mean daily emergency department visit counts, model parameter value estimates and daily values for air pollution and meteorological variables from actual data (8/1/98 to 7/31/99 in Atlanta) were used to generate simulated daily outcome counts with specified temporal associations with air pollutants and randomly generated error based on a Poisson distribution. Power was estimated by conducting analyses of the association between simulated daily outcome counts and air pollution in 2000 data sets for each scenario. Power estimates from simulations and statistical software (G*Power and PASS) were compared.
In the simulation results, increasing time-series length and average daily outcome counts both increased power to a similar extent. Our results also illustrate the low power that can result from using outcomes with low daily counts or short time series, and the reduction in power that can accompany use of multipollutant models. Power estimates obtained using standard statistical software were very similar to those from the simulations when properly implemented; implementation, however, was not straightforward.
These analyses demonstrate the similar impact on power of increasing time-series length versus increasing daily outcome counts, which has not previously been reported. Implementation of power software for these studies is discussed and guidance is provided.
空气污染急性健康影响的时间序列研究中,评估关联性所需的功效估计可能具有挑战性,因为样本量有两个维度(时间序列长度和每日结果计数),并且这些研究通常使用广义线性模型来控制污染物和时间趋势、气象学以及可能其他污染物之间复杂的协变模式。一般来说,用于功效估计的统计软件包依赖于可能无法充分捕捉这种复杂性的简化假设。在这里,我们通过模拟来检查影响功效的各种因素的影响,并将从模拟中获得的功效估计与从统计软件中获得的功效估计进行比较。
使用模拟为指定的情况在空气污染和急诊就诊的时间序列研究中进行了各种分析的功效估计。实际数据(1998 年 8 月 1 日至 1999 年 7 月 31 日在亚特兰大)的每日平均急诊就诊计数、模型参数值估计和空气污染及气象变量的每日值用于生成具有指定与空气污染物的时间关联并基于泊松分布随机生成误差的模拟每日结果计数。通过对每个方案的 2000 年数据集进行模拟每日结果计数与空气污染之间的关联分析来估计功效。模拟和统计软件(G*Power 和 PASS)的功效估计进行了比较。
在模拟结果中,增加时间序列长度和平均每日结果计数都会以相似的程度增加功效。我们的结果还说明了使用每日计数低或时间序列短的结果以及使用多污染物模型可能会降低功效的结果。当正确实施时,使用标准统计软件获得的功效估计与模拟非常相似;但是,实施并不简单。
这些分析表明,增加时间序列长度与增加每日结果计数对功效的影响相似,这是以前没有报道过的。讨论了这些研究的功效软件的实施,并提供了指导。