Educational Testing Service, Princeton, NJ, USA.
MITRE Corporation, Bedford, MA, USA.
Behav Res Methods. 2019 Apr;51(2):507-522. doi: 10.3758/s13428-018-1098-4.
The validity of studies investigating interventions to enhance fluid intelligence (Gf) depends on the adequacy of the Gf measures administered. Such studies have yielded mixed results, with a suggestion that Gf measurement issues may be partly responsible. The purpose of this study was to develop a Gf test battery comprising tests meeting the following criteria: (a) strong construct validity evidence, based on prior research; (b) reliable and sensitive to change; (c) varying in item types and content; (d) producing parallel tests, so that pretest-posttest comparisons could be made; (e) appropriate time limits; (f) unidimensional, to facilitate interpretation; and (g) appropriate in difficulty for a high-ability population, to detect change. A battery comprising letter, number, and figure series and figural matrix item types was developed and evaluated in three large-N studies (N = 3,067, 2,511, and 801, respectively). Items were generated algorithmically on the basis of proven item models from the literature, to achieve high reliability at the targeted difficulty levels. An item response theory approach was used to calibrate the items in the first two studies and to establish conditional reliability targets for the tests and the battery. On the basis of those calibrations, fixed parallel forms were assembled for the third study, using linear programming methods. Analyses showed that the tests and test battery achieved the proposed criteria. We suggest that the battery as constructed is a promising tool for measuring the effectiveness of cognitive enhancement interventions, and that its algorithmic item construction enables tailoring the battery to different difficulty targets, for even wider applications.
研究旨在增强流体智力 (Gf) 的干预措施的有效性取决于所采用的 Gf 测量方法的充分性。这些研究产生了混合的结果,表明 Gf 测量问题可能部分负责。本研究的目的是开发一个 Gf 测试组合,包括符合以下标准的测试:(a)基于先前研究的强有力的结构有效性证据;(b)可靠且对变化敏感;(c)在项目类型和内容上有所不同;(d)产生平行测试,以便进行预测试-后测试比较;(e)适当的时间限制;(f)具有单一维度,便于解释;以及 (g)适合高能力人群的难度,以检测变化。一个由字母、数字和图形系列以及图形矩阵项目类型组成的测试组合被开发并在三项大型 N 研究中进行了评估(N=3067、2511 和 801)。根据文献中经过验证的项目模型,通过算法生成项目,以在目标难度水平上实现高可靠性。使用项目反应理论方法对前两项研究中的项目进行校准,并为测试和测试组合建立条件可靠性目标。基于这些校准,使用线性规划方法为第三项研究组装固定的平行形式。分析表明,测试和测试组合达到了提出的标准。我们建议,所构建的电池是衡量认知增强干预措施有效性的有前途的工具,并且其算法项目构建能够根据不同的难度目标定制电池,以实现更广泛的应用。