Kaya Deniz, Yavuz Selim
Department of Mathematics Education, Faculty of Education, Nevsehir Hacı Bektas Veli University, 50300 Nevsehir, Türkiye.
Department of Curriculum and Instruction, School of Education, Indiana University Bloomington, Bloomington, IN 47405, USA.
J Intell. 2025 Apr 2;13(4):43. doi: 10.3390/jintelligence13040043.
This study investigates the potential of generative artificial intelligence tools in addressing cognitive challenges encountered by humans during problem-solving. The performance of ChatGPT-4o and GPT-4 models in the NAEP mathematics assessments was evaluated, particularly in relation to the cognitive demands placed on students. Sixty NAEP mathematics assessment tasks, coded by field experts, were analyzed within a framework of cognitive complexity. ChatGPT-4o and GPT-4 provided responses to each question, which were then evaluated using NAEP's scoring criteria. The study's dataset was analyzed using the average performance scores of students who answered correctly and the item-wise response percentages. The results indicated that ChatGPT-4o and GPT-4 outperformed most students on individual items in the NAEP mathematics assessment. Furthermore, as the cognitive demand increased, higher performance scores were required to answer questions correctly. This trend was observed across the 4th, 8th, and 12th grades, though ChatGPT-4o and GPT-4 did not demonstrate statistically significant sensitivity to increased cognitive demands at the 12th-grade level.
本研究探讨了生成式人工智能工具在应对人类解决问题过程中遇到的认知挑战方面的潜力。评估了ChatGPT-4o和GPT-4模型在国家教育进展评估(NAEP)数学测试中的表现,特别是与对学生的认知要求相关的表现。由领域专家编码的60项NAEP数学评估任务,在认知复杂性框架内进行了分析。ChatGPT-4o和GPT-4对每个问题都提供了回答,然后使用NAEP的评分标准进行评估。使用回答正确的学生的平均成绩分数和逐题回答百分比对该研究的数据集进行了分析。结果表明,在NAEP数学评估的个别题目上,ChatGPT-4o和GPT-4的表现优于大多数学生。此外,随着认知要求的提高,正确回答问题需要更高的成绩分数。在4年级、8年级和12年级都观察到了这一趋势,不过ChatGPT-4o和GPT-4在12年级水平上对认知要求提高并未表现出统计学上显著的敏感性。