Liv Per, Mathiassen Svend Erik, Svendsen Susanne Wulff
Department of Occupational and Public Health Sciences, Centre for Musculoskeletal Research, University of Gävle, Gävle, Sweden.
Ann Occup Hyg. 2011 May;55(4):436-49. doi: 10.1093/annhyg/meq095.
To investigate the statistical efficiency of strategies for sampling upper arm elevation data, which differed with respect to sample sizes and sample allocations within and across measurement days. The study was also designed to compare standard theoretical predictions of sampling efficiency, which rely on several assumptions about the data structure, with 'true' efficiency as determined by bootstrap simulations.
Sixty-five sampling strategies were investigated using a data set containing minute-by-minute values of average right upper arm elevation, percentage of time with an arm elevated <15°, and percentage of time with an arm elevated >90° in a population of 23 house painters, 23 car mechanics, and 26 machinists, all followed for four full working days. Total sample times per subject between 30 and 240 min were subdivided into continuous time blocks between 1 and 240 min long, allocated to 1 or 4 days per subject. Within day(s), blocks were distributed using either a random or a fixed-interval principle. Sampling efficiency was expressed in terms of the variance of estimated mean exposure values of 20 subjects and assessed using standard theoretical models assuming independence between variables and homoscedasticity. Theoretical performance was compared to empirical efficiencies obtained by a nonparametric bootstrapping procedure.
We found the assumptions of independence and homoscedasticity in the theoretical model to be violated, most notably expressed through an autocorrelation between measurement units within working days. The empirical variance of the mean exposure estimates decreased, i.e. sampling efficiency increased, for sampling strategies where measurements were distributed widely across time. Thus, the most efficient allocation strategy was to organize a sample into 1-min block collected at fixed time intervals across 4 days. Theoretical estimates of efficiency generally agreed with empirical variances if the sample was allocated into small blocks, while for larger block sizes, the empirical 'true' variance was considerably larger than predicted by theory. Theory overestimated efficiency in particular for strategies with short total sample times per subject.
This study demonstrates that when exposure data are autocorrelated within days-which we argue is the major reason why theory overestimates sampling performance-sampling efficiency can be improved by distributing the sample widely across the day or across days, preferably using a fixed-interval strategy. While this guidance is particularly valid when small proportions of working days are assessed, we generally recommend collecting more data than suggested by theory if a certain precision of the resulting exposure estimate is needed. More data per se give a better precision and sampling larger proportion(s) of the working day(s) also alleviate the negative effects of possible autocorrelation in data.
研究上臂抬高数据采样策略的统计效率,这些策略在样本量以及测量日内和测量日之间的样本分配方面存在差异。该研究还旨在将依赖于关于数据结构的若干假设的采样效率的标准理论预测与通过自助模拟确定的“真实”效率进行比较。
使用一个数据集对65种采样策略进行了研究,该数据集包含23名房屋油漆工、23名汽车修理工和26名机械师群体中平均右上臂抬高的逐分钟值、手臂抬高<15°的时间百分比以及手臂抬高>90°的时间百分比,所有这些人都被跟踪了四个完整工作日。每个受试者的总采样时间在30到240分钟之间,被细分为1到240分钟长的连续时间块,每个受试者分配到1天或4天。在一天(或多天)内,时间块按照随机或固定间隔原则进行分配。采样效率用20名受试者估计平均暴露值的方差表示,并使用假设变量之间独立和同方差的标准理论模型进行评估。将理论性能与通过非参数自助程序获得的经验效率进行比较。
我们发现理论模型中的独立性和同方差假设被违反,最明显的表现是工作日内测量单位之间的自相关性。对于测量在时间上广泛分布的采样策略,平均暴露估计值的经验方差减小,即采样效率提高。因此,最有效的分配策略是将样本组织成在4天内以固定时间间隔收集的1分钟时间块。如果样本被分成小时间块,效率的理论估计通常与经验方差一致,而对于较大的时间块大小,经验“真实”方差比理论预测的要大得多。特别是对于每个受试者总采样时间较短的策略,理论高估了效率。
本研究表明,当暴露数据在日内存在自相关性时(我们认为这是理论高估采样性能的主要原因),通过在一天内或多天内广泛分布样本,最好使用固定间隔策略,可以提高采样效率。虽然当评估的工作日比例较小时,这一指导尤为有效,但如果需要所得暴露估计具有一定精度,我们通常建议收集比理论建议更多的数据。本身更多的数据会带来更好的精度,并且对工作日的更大比例进行采样也会减轻数据中可能存在的自相关性的负面影响。