Coyne Joseph T, Jamison Laura, Strong Kaylin, Sibley Ciara, Foroughi Cyrus, Melick Sarah
Informaiton Technology Division, Naval Research Laboratory, Washington, DC, USA.
Strategic Analysis, Arlington, VA, USA.
Cogn Res Princ Implic. 2025 Aug 20;10(1):51. doi: 10.1186/s41235-025-00660-3.
This paper looks at how process-based spatial ability and attention measures taken within a high-stakes battery used to select pilots in the US Navy compare to lab-based measures of the same constructs. Process-based measures typically function by having individuals perform either a novel task or perform a task with novel stimuli. However, applicants often spend time practicing the tasks prior to taking the battery. A group of 307 Naval Flight Students participated in the study, in which they took several spatial ability, attention and general processing measures. One of the spatial tasks used in the study was the same as the spatial task in the Navy's pilot selection battery, which all of the participants had taken. All of the lab spatial ability measures including the one used in the selection battery were highly correlated and loaded onto the same spatial ability factor. However, the high-stakes spatial subtest was not correlated with any of the lab spatial measures including the same test administered in the lab. The lab spatial ability data was also correlated with training outcomes whereas the high-stakes process spatial and attention measures were not. The high-stakes attention measure was weakly correlated with some of the general processing measures. The pattern of results suggest that familiarity with the spatial and attention tasks in the high-stakes environment may be negating those tests ability to measure the constructs they were designed to measure, and also reducing their effectiveness to predict training performance. Statement of significance: This paper addresses an increasingly difficult challenge the Navy is facing within aviation selection, in that applicants are highly motivated and have access to unofficial replicas of the Navy's test battery. The challenge is specific to the process-based measures such as spatial ability and attention that rely on some degree of novelty to work. When applicants practice these types of tests they can practice to the test, memorize items, and learn strategies which impact the test's ability to measure the cognitive construct it was designed to measure as well as reduces its ability to predict flight training outcomes. This is particularly problematic as the unofficial test preparation software can replicate a new test within days. While the data presented here are limited to spatial ability and attention within military pilot selection it applies to a much broader community of researchers. Anyone developing a high-stakes test with a large and motivated applicant pool may also see their process-based measures perform differently in a high-stakes environment than a low stakes laboratory one in which participants are naïve to the tasks they are taking. The extent to which practice can alter the effectiveness of high-stakes test performance is an important one. The results of the paper suggest that test developers should assume participants are practiced and assess the extent to which practice on process-based measure impacts the tasks ability to measure the construct of interest and predict performance.
本文探讨了在美国海军用于选拔飞行员的高风险测试组合中所采用的基于过程的空间能力和注意力测量方法,与基于实验室的相同结构测量方法相比有何不同。基于过程的测量方法通常通过让个体执行一项新颖任务或使用新颖刺激执行任务来发挥作用。然而,申请者在参加测试组合之前通常会花时间练习这些任务。307名海军飞行学员参与了这项研究,他们接受了多项空间能力、注意力和一般处理能力的测量。研究中使用的一项空间任务与海军飞行员选拔测试组合中的空间任务相同,所有参与者都参加过该测试组合。所有实验室空间能力测量方法,包括选拔测试组合中使用的那种,都高度相关,并加载到同一个空间能力因子上。然而,高风险空间子测试与任何实验室空间测量方法都不相关,包括在实验室中进行的相同测试。实验室空间能力数据也与训练结果相关,而高风险过程空间和注意力测量方法则不然。高风险注意力测量方法与一些一般处理测量方法弱相关。结果模式表明,在高风险环境中对空间和注意力任务的熟悉程度可能会使这些测试无法测量其设计要测量的结构,也会降低它们预测训练表现的有效性。意义声明:本文解决了海军在航空选拔中面临的一个日益困难的挑战,即申请者积极性很高,并且能够获得海军测试组合的非官方复制品。这个挑战特定于基于过程的测量方法,如空间能力和注意力,这些方法在一定程度上依赖新颖性来发挥作用。当申请者练习这些类型的测试时,他们可以针对测试进行练习、记忆题目并学习策略,这会影响测试测量其设计要测量的认知结构的能力,也会降低其预测飞行训练结果的能力。随着非官方测试准备软件能在数天内复制新测试,这一问题尤其严重。虽然这里呈现的数据仅限于军事飞行员选拔中的空间能力和注意力,但它适用于更广泛的研究群体。任何开发针对大量积极申请者的高风险测试的人,可能也会发现他们基于过程的测量方法在高风险环境中的表现与低风险实验室环境(参与者对所执行任务不熟悉)中的表现不同。练习能在多大程度上改变高风险测试表现的有效性是一个重要问题。本文结果表明,测试开发者应假定参与者进行过练习,并评估基于过程的测量方法的练习对任务测量感兴趣结构和预测表现的能力产生的影响程度。