Sideridis Georgios, Jaffari Fathima
Boston Children's Hospital and Harvard Medical School, Boston, MA, United States.
Department of Research, National and Kapodistrian University of Athens, Athens, Greece.
Front Psychol. 2023 Aug 22;14:1210958. doi: 10.3389/fpsyg.2023.1210958. eCollection 2023.
The purpose of the present study was to evaluate the reliability and validity of the General Aptitude Test (GAT), a national instrument for the measurement of aptitude/achievement in the Kingdom of Saudi Arabia as a function of daytime testing. Participants were 722 students who took on the GAT across morning and evening administrations in a within-person pre-post design. Participants were matched for gender, parental education, and test center characteristics (i.e., size). The GAT was tested for its psychometric properties and its measurement invariance across time of day. Results pointed to a significant misfit using an exact invariance protocol. Specifically, there was a large number of non-invariant items pointing to Differential Item Functioning (DIF). Second, internal consistency reliabilities were consistently lower during morning testing compared to evening testing as evidenced using both statistical and visual means. Concerns about dimensionality were also raised for the morning compared to the evening administration. Last, comparison of performance levels indicated that morning testing was associated with significant decrements in performance across all domains compared to performance levels during evening testing. The results have implications for the validity of measurement and public testing policy if test validity during morning administration is compromised.
本研究的目的是评估通用能力倾向测试(GAT)的信度和效度。GAT是沙特阿拉伯王国用于测量能力倾向/成绩的一项全国性工具,本研究将其作为日间测试的一项功能进行评估。参与者为722名学生,他们在被试内前后测设计中,于上午和晚上参加了GAT测试。参与者在性别、父母教育程度和测试中心特征(即规模)方面进行了匹配。对GAT的心理测量特性及其在一天中不同时间的测量不变性进行了测试。结果表明,使用精确不变性协议时存在显著不匹配。具体而言,有大量非不变项目表明存在项目功能差异(DIF)。其次,与晚上测试相比,上午测试期间的内部一致性信度始终较低,这在统计和直观方法中均得到了证明。与晚上施测相比,上午施测的维度问题也受到了关注。最后,成绩水平比较表明,与晚上测试期间的成绩水平相比,上午测试与所有领域的成绩显著下降有关。如果上午施测期间的测试效度受到损害,这些结果对测量效度和公共测试政策具有启示意义。