Johnson Wendy, Deary Ian J, Bouchard Thomas J
University of Edinburgh, Edinburgh, UK.
Steamboat Springs, CO, USA.
Educ Psychol Meas. 2018 Dec;78(6):1021-1055. doi: 10.1177/0013164417736092. Epub 2017 Oct 26.
Most study samples show less variability in key variables than do their source populations due most often to indirect selection into study participation associated with a wide range of personal and circumstantial characteristics. Formulas exist to correct the distortions of population-level correlations created. Formula accuracy has been tested using simulated normally distributed data, but empirical data are rarely available for testing. We did so in a rare data set in which it was possible: the 6-Day Sample, a representative subsample of 1,208 from the Scottish Mental Survey 1947 of cognitive ability in 1936-born Scottish schoolchildren (70,805). 6-Day Sample participants completed a follow-up assessment in childhood and were re-recruited for study at age 77 years. We compared full 6-Day Sample correlations of early-life variables with those of the range-restricted correlations in the later-participating subsample, before and after adjustment for direct and indirect range restriction. Results differed, especially for two highly correlated cognitive tests; neither reproduced full-sample correlations well due to small deviations from normal distribution in skew and kurtosis. Maximum likelihood estimates did little better. To assess these results' typicality, we simulated sample selection and made similar comparisons using the 42 cognitive ability tests administered to the Minnesota Study of Twins Reared Apart, with very similar results. We discuss problems in developing further adjustments to offset range-restriction distortions and possible approaches to solutions.
大多数研究样本在关键变量上的变异性低于其源人群,这通常是由于与广泛的个人和环境特征相关的研究参与间接选择所致。存在用于校正所产生的总体水平相关性扭曲的公式。公式准确性已使用模拟正态分布数据进行测试,但很少有经验数据可用于测试。我们在一个罕见的数据集中做到了这一点:6天样本,它是1947年苏格兰精神调查中1936年出生的苏格兰学童(70,805人)认知能力的1208人的代表性子样本。6天样本参与者在童年时期完成了随访评估,并在77岁时被重新招募进行研究。我们比较了早期生活变量的完整6天样本相关性与后期参与子样本中范围受限相关性的相关性,在对直接和间接范围限制进行调整之前和之后。结果有所不同,特别是对于两项高度相关的认知测试;由于偏度和峰度与正态分布存在小偏差,两者都不能很好地再现全样本相关性。最大似然估计效果也好不到哪里去。为了评估这些结果的典型性,我们模拟了样本选择,并使用对明尼苏达分开抚养双胞胎研究进行的42项认知能力测试进行了类似比较,结果非常相似。我们讨论了在进一步调整以抵消范围限制扭曲方面的问题以及可能的解决方法。