Helland-Riise Fredrik, Norrøne Tore Nøttestad, Andersson Björn
Centre for Educational Measurement (CEMO), University of Oslo, 0318 Oslo, Norway.
The Norwegian Armed Forces, 0593 Oslo, Norway.
J Intell. 2024 Aug 29;12(9):82. doi: 10.3390/jintelligence12090082.
Figural matrices tests are common in intelligence research and have been used to draw conclusions regarding secular changes in intelligence. However, their measurement properties have seldom been evaluated with large samples that include both sexes. Using data from the Norwegian Armed Forces, we study the measurement properties of a test used for selection in military recruitment. Item-level data were available from 113,671 Norwegian adolescents (32% female) tested between the years 2011 and 2017. Utilizing item response theory (IRT), we characterize the measurement properties of the test in terms of difficulty, discrimination, precision, and measurement invariance between males and females. We estimate sex differences in the mean and variance of the latent variable and evaluate the impact of violations to measurement invariance on the estimated distribution parameters. The results show that unidimensional IRT models fit well in all groups and years. There is little difference in precision and test difficulty between males and females, with precision that is generally poor on the upper part of the scale. In the sample, male latent proficiency is estimated to be slightly higher on average, with higher variance. Adjusting for measurement invariance generally reduces the sex differences but does not eliminate them. We conclude that previous studies using the Norwegian GMA data must be interpreted with more caution but that the test should measure males and females equally fairly.
图形矩阵测试在智力研究中很常见,并已被用于得出有关智力长期变化的结论。然而,它们的测量特性很少在包括男女两性的大样本中进行评估。利用挪威武装部队的数据,我们研究了一种用于军事招募选拔的测试的测量特性。2011年至2017年间对113,671名挪威青少年(32%为女性)进行了测试,可获得项目层面的数据。利用项目反应理论(IRT),我们从难度、区分度、精度以及男女之间的测量不变性等方面描述了该测试的测量特性。我们估计了潜在变量均值和方差的性别差异,并评估了违反测量不变性对估计分布参数的影响。结果表明,单维IRT模型在所有组和年份中拟合良好。男性和女性在精度和测试难度上几乎没有差异,在量表上部精度通常较差。在样本中,男性潜在能力平均估计略高,方差也更高。调整测量不变性通常会减少性别差异,但并不能消除它们。我们得出结论,使用挪威一般智力数据的先前研究必须更加谨慎地解释,但该测试应该对男性和女性进行公平的测量。