Trahan Lisa H, Stuebing Karla K, Fletcher Jack M, Hiscock Merrill
Department of Psychology, University of Houston.
Texas Institute for Measurement, Evaluation, and Statistics, University of Houston.
Psychol Bull. 2014 Sep;140(5):1332-60. doi: 10.1037/a0037173. Epub 2014 Jun 30.
The Flynn effect refers to the observed rise in IQ scores over time, which results in norms obsolescence. Although the Flynn effect is widely accepted, most efforts to estimate it have relied upon "scorecard" approaches that make estimates of its magnitude and error of measurement controversial and prevent determination of factors that moderate the Flynn effect across different IQ tests. We conducted a meta-analysis to determine the magnitude of the Flynn effect with a higher degree of precision, to determine the error of measurement, and to assess the impact of several moderator variables on the mean effect size. Across 285 studies (N = 14,031) since 1951 with administrations of 2 intelligence tests with different normative bases, the meta-analytic mean was 2.31, 95% CI [1.99, 2.64], standard score points per decade. The mean effect size for 53 comparisons (N = 3,951, excluding 3 atypical studies that inflate the estimates) involving modern (since 1972) Stanford-Binet and Wechsler IQ tests (2.93, 95% CI [2.3, 3.5], IQ points per decade) was comparable to previous estimates of about 3 points per decade but was not consistent with the hypothesis that the Flynn effect is diminishing. For modern tests, study sample (larger increases for validation research samples vs. test standardization samples) and order of administration explained unique variance in the Flynn effect, but age and ability level were not significant moderators. These results supported previous estimates of the Flynn effect and its robustness across different age groups, measures, samples, and levels of performance.
弗林效应是指随着时间推移观察到的智商分数上升,这导致常模过时。尽管弗林效应已被广泛接受,但大多数对其进行估计的努力都依赖于“计分卡”方法,这使得对其大小和测量误差的估计存在争议,并妨碍了确定在不同智商测试中调节弗林效应的因素。我们进行了一项荟萃分析,以更高的精度确定弗林效应的大小,确定测量误差,并评估几个调节变量对平均效应大小的影响。在自1951年以来的285项研究(N = 14,031)中,对两种具有不同常模基础的智力测试进行了施测,荟萃分析的平均值为每十年2.31个标准分数点,95%置信区间为[1.99, 2.64]。在涉及现代(自1972年以来)斯坦福-比奈和韦氏智商测试的53项比较(N = 3,951,不包括3项使估计值膨胀的非典型研究)中,平均效应大小为每十年2.93个智商点,95%置信区间为[2.3, 3.5],这与之前每十年约3个智商点的估计值相当,但与弗林效应正在减弱的假设不一致。对于现代测试,研究样本(验证研究样本的增长幅度大于测试标准化样本)和施测顺序解释了弗林效应中的独特方差,但年龄和能力水平不是显著的调节因素。这些结果支持了之前对弗林效应及其在不同年龄组、测量方法、样本和表现水平上的稳健性所作的估计。