• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式人工智能和ChatGPT能否打破人类在数学领域的主导地位,并重塑在需要认知能力的问题解决任务中的能力?

Can Generative AI and ChatGPT Break Human Supremacy in Mathematics and Reshape Competence in Cognitive-Demanding Problem-Solving Tasks?

作者信息

Kaya Deniz, Yavuz Selim

机构信息

Department of Mathematics Education, Faculty of Education, Nevsehir Hacı Bektas Veli University, 50300 Nevsehir, Türkiye.

Department of Curriculum and Instruction, School of Education, Indiana University Bloomington, Bloomington, IN 47405, USA.

出版信息

J Intell. 2025 Apr 2;13(4):43. doi: 10.3390/jintelligence13040043.

DOI:10.3390/jintelligence13040043
PMID:40278052
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12027771/
Abstract

This study investigates the potential of generative artificial intelligence tools in addressing cognitive challenges encountered by humans during problem-solving. The performance of ChatGPT-4o and GPT-4 models in the NAEP mathematics assessments was evaluated, particularly in relation to the cognitive demands placed on students. Sixty NAEP mathematics assessment tasks, coded by field experts, were analyzed within a framework of cognitive complexity. ChatGPT-4o and GPT-4 provided responses to each question, which were then evaluated using NAEP's scoring criteria. The study's dataset was analyzed using the average performance scores of students who answered correctly and the item-wise response percentages. The results indicated that ChatGPT-4o and GPT-4 outperformed most students on individual items in the NAEP mathematics assessment. Furthermore, as the cognitive demand increased, higher performance scores were required to answer questions correctly. This trend was observed across the 4th, 8th, and 12th grades, though ChatGPT-4o and GPT-4 did not demonstrate statistically significant sensitivity to increased cognitive demands at the 12th-grade level.

摘要

本研究探讨了生成式人工智能工具在应对人类解决问题过程中遇到的认知挑战方面的潜力。评估了ChatGPT-4o和GPT-4模型在国家教育进展评估(NAEP)数学测试中的表现,特别是与对学生的认知要求相关的表现。由领域专家编码的60项NAEP数学评估任务,在认知复杂性框架内进行了分析。ChatGPT-4o和GPT-4对每个问题都提供了回答,然后使用NAEP的评分标准进行评估。使用回答正确的学生的平均成绩分数和逐题回答百分比对该研究的数据集进行了分析。结果表明,在NAEP数学评估的个别题目上,ChatGPT-4o和GPT-4的表现优于大多数学生。此外,随着认知要求的提高,正确回答问题需要更高的成绩分数。在4年级、8年级和12年级都观察到了这一趋势,不过ChatGPT-4o和GPT-4在12年级水平上对认知要求提高并未表现出统计学上显著的敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/20202bc3118e/jintelligence-13-00043-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/9bc57d3a520b/jintelligence-13-00043-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/342104148fb0/jintelligence-13-00043-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/db4580ce6743/jintelligence-13-00043-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/aaedaae80b37/jintelligence-13-00043-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/20202bc3118e/jintelligence-13-00043-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/9bc57d3a520b/jintelligence-13-00043-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/342104148fb0/jintelligence-13-00043-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/db4580ce6743/jintelligence-13-00043-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/aaedaae80b37/jintelligence-13-00043-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e480/12027771/20202bc3118e/jintelligence-13-00043-g005.jpg

相似文献

1
Can Generative AI and ChatGPT Break Human Supremacy in Mathematics and Reshape Competence in Cognitive-Demanding Problem-Solving Tasks?生成式人工智能和ChatGPT能否打破人类在数学领域的主导地位,并重塑在需要认知能力的问题解决任务中的能力?
J Intell. 2025 Apr 2;13(4):43. doi: 10.3390/jintelligence13040043.
2
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.GPT-4o 在回答模拟的欧洲介入放射学委员会考试方面的能力与德国医学生和专家相比,以及其在介入放射学方面生成考试项目的能力:一项描述性研究。
J Educ Eval Health Prof. 2024;21:21. doi: 10.3352/jeehp.2024.21.21. Epub 2024 Aug 20.
3
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o(GPT-4o)用于解答欧洲放射学文凭(EDiR)基于文本的多项选择题:与放射科医生的对比研究
Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.
4
Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。
Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.
5
AI-powered standardised patients: evaluating ChatGPT-4o's impact on clinical case management in intern physicians.人工智能驱动的标准化病人:评估ChatGPT-4o对实习医生临床病例管理的影响。
BMC Med Educ. 2025 Feb 20;25(1):278. doi: 10.1186/s12909-025-06877-6.
6
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
7
ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.ChatGPT在葡萄牙语医学考试问题上的表现:ChatGPT-3.5 Turbo与ChatGPT-4o Mini的比较分析。
JMIR Med Educ. 2025 Mar 5;11:e65108. doi: 10.2196/65108.
8
Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1.评估聊天生成预训练变换器(GPT-4o)在日本生物医学工程1级证书考试中的问题解决表现。
Cureus. 2025 Mar 23;17(3):e81029. doi: 10.7759/cureus.81029. eCollection 2025 Mar.
9
A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.基于多项选择题眼科题库对GPT-3.5和GPT-4.0的比较分析:一项关于人工智能发展的研究。
Rom J Ophthalmol. 2024 Oct-Dec;68(4):367-371. doi: 10.22336/rjo.2024.67.
10
Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.评估GPT-3.5、GPT-4和GPT-4o在中国国家医师资格考试中的表现。
Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2.

本文引用的文献

1
The Evolution of Intelligence: Analysis of the Journal of Intelligence and Intelligence.《智力的演变:对〈智力杂志〉及智力的分析》
J Intell. 2023 Feb 14;11(2):35. doi: 10.3390/jintelligence11020035.
2
Building machines that learn and think like people.建造像人一样学习和思考的机器。
Behav Brain Sci. 2017 Jan;40:e253. doi: 10.1017/S0140525X16001837. Epub 2016 Nov 24.
3
Machine learning: Trends, perspectives, and prospects.机器学习:趋势、观点和展望。
Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.
4
Cognitive load theory in health professional education: design principles and strategies.认知负荷理论在健康专业教育中的应用:设计原则与策略。
Med Educ. 2010 Jan;44(1):85-93. doi: 10.1111/j.1365-2923.2009.03498.x.