Department of Psychology, Northwestern University, Evanston, IL, USA,
Behav Res Methods. 2014 Sep;46(3):786-97. doi: 10.3758/s13428-013-0402-6.
The method of oversampling data from a preselected range of a variable's distribution is often applied by researchers who wish to study rare outcomes without substantially increasing sample size. Despite frequent use, however, it is not known whether this method introduces statistical bias due to disproportionate representation of a particular range of data. The present study employed simulated data sets to examine how oversampling introduces systematic bias in effect size estimates (of the relationship between oversampled predictor variables and the outcome variable), as compared with estimates based on a random sample. In general, results indicated that increased oversampling was associated with a decrease in the absolute value of effect size estimates. Critically, however, the actual magnitude of this decrease in effect size estimates was nominal. This finding thus provides the first evidence that the use of the oversampling method does not systematically bias results to a degree that would typically impact results in behavioral research. Examining the effect of sample size on oversampling yielded an additional important finding: For smaller samples, the use of oversampling may be necessary to avoid spuriously inflated effect sizes, which can arise when the number of predictor variables and rare outcomes is comparable.
研究人员经常采用从变量分布的预选范围内对数据进行过采样的方法,以便在不显著增加样本量的情况下研究罕见结果。然而,尽管这种方法经常被使用,但尚不清楚这种方法是否会由于对特定数据范围的不成比例表示而引入统计偏差。本研究使用模拟数据集来检查与基于随机样本的估计相比,过采样如何在效应量估计(过采样预测变量与因变量之间的关系)中引入系统偏差。一般来说,结果表明,过采样的增加与效应量估计的绝对值减小有关。然而,重要的是,这种效应量估计值的减小幅度实际上是微不足道的。因此,这一发现首次提供了证据,表明使用过采样方法不会系统地偏倚结果,以至于通常会影响行为研究的结果。检查样本量对过采样的影响得出了另一个重要发现:对于较小的样本,可能需要使用过采样来避免虚假膨胀的效应量,当预测变量和罕见结果的数量相当时,就会出现这种情况。