Shono Yusuke, Ece Berivan, Ho Emily H, Kaat Aaron J, LaForte Erica M, Ayturk Ezgi, Gershon Richard
School of Community and Global Health, Claremont Graduate University.
Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine.
Psychol Assess. 2024 Dec;36(12):760-771. doi: 10.1037/pas0001350.
Executive function (EF) has been extensively linked to various behavioral, clinical, and educational outcomes. There have been, however, few systematic investigations into how best to score EF tasks using speed and accuracy performance, particularly how to generate a summary and norm-referenced score. Using data from an updated norming study for the NIH Toolbox Version 3 (NIHTB V3) with the general U.S. population aged between 3 and 85 (N = 3,794; 52.3% female; Mage = 25.06, SDage = 22.92), we empirically evaluated and compared several scoring algorithms for two EF tests: The Dimensional Change Card Sort (a test of cognitive flexibility) and Flanker (a test of inhibitory control) Tests. Results showed that joint scoring algorithms integrating speed and accuracy into single scores (namely, rate-correct score, linear integrated speed-accuracy score, and speed-accuracy additive score) provided more robust psychometric evidence for the EF tests than single-index scores of accuracy and speed. These integrated speed-accuracy scores were consistent and stable within and across tasks and time; similar to that of another well-validated EF measure, but as predicted, not related to a crystallized intelligence measure score; and increased rapidly from early childhood through late adolescence/early adulthood and then declined toward late adulthood. The rate-correct score was particularly free from ceiling effects and sensitive to age-related changes and variability in EF performance. Among various scoring algorithms, we recommend rate-correct score, which served as the basis for generating new NIHTB V3 norm-referenced scores, with good test-retest reliability (Dimensional Change Card Sort = .77, Flanker = .81) and acceptable convergent and discriminant validity. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
执行功能(EF)已被广泛地与各种行为、临床和教育结果联系起来。然而,对于如何利用速度和准确性表现来最好地对执行功能任务进行评分,尤其是如何生成一个汇总的、常模参照分数,却很少有系统的研究。利用来自美国国立卫生研究院工具箱第3版(NIHTB V3)更新后的常模研究数据,该研究针对年龄在3至85岁之间的美国普通人群(N = 3794;女性占52.3%;年龄中位数 = 25.06,年龄标准差 = 22.92),我们实证评估并比较了两种执行功能测试的几种评分算法:维度变化卡片分类任务(一种认知灵活性测试)和侧翼任务(一种抑制控制测试)。结果表明,将速度和准确性整合为单一分数的联合评分算法(即正确率分数、线性整合速度 - 准确性分数和速度 - 准确性相加分数)比准确性和速度的单指标分数为执行功能测试提供了更强有力的心理测量学证据。这些整合的速度 - 准确性分数在任务内和任务间以及不同时间都是一致且稳定的;与另一种经过充分验证的执行功能测量方法类似,但正如所预测的,与一种晶体智力测量分数无关;并且从幼儿期到青少年晚期/成年早期迅速增加,然后在成年晚期下降。正确率分数尤其不受天花板效应的影响,并且对与年龄相关的执行功能表现变化和变异性敏感。在各种评分算法中,我们推荐正确率分数,它作为生成新的NIHTB V3常模参照分数的基础,具有良好的重测信度(维度变化卡片分类任务 = 0.77,侧翼任务 = 0.81)以及可接受的聚合效度和区分效度。(《心理学文摘数据库记录》(c)2024美国心理学会,保留所有权利)