• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4在挪威医学考试中的形成性和总结性评估能力——干预早期阶段的一项内在案例研究。

GPT-4's capabilities for formative and summative assessments in Norwegian medicine exams-an intrinsic case study in the early phase of intervention.

作者信息

Krumsvik Rune Johan

机构信息

Department of Education, University of Bergen, Bergen, Norway.

出版信息

Front Med (Lausanne). 2025 Apr 10;12:1441747. doi: 10.3389/fmed.2025.1441747. eCollection 2025.

DOI:10.3389/fmed.2025.1441747
PMID:40276737
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12018347/
Abstract

The growing integration of artificial intelligence (AI) in education has paved the way for innovative assessment methods. This study explores the capabilities of GPT-4, which is a large language model (LLM), on a medicine exam and for formative and summative assessments in Norwegian educational settings. This research builds on our previous work to explore how AI, specifically GPT-4, can enhance assessment practices by evaluating its performance on a full-scale medical multiple-choice exam. Prior studies have revealed that LLM's can have certain potential in medical education but have not specifically examined how GPT-4 can enhance formative and summative assessments in medical education. Therefore, my study contributes to filling gaps in the current knowledge by examining GPT-4's capabilities for formative and summative assessment in medical education in Norway. For this purpose, a case study design was employed, and the primary data sources were 110 exam questions, 10 blinded exam questions, and 2 patient cases within medicine. The findings from this intrinsic case study revealed that GPT-4 performed well on the summative assessment, with a robust handling of the Norwegian medical language. Further, GPT-4 demonstrated a reliable evaluation of comprehensive student exams, such as patient cases, and, thus, aligned closely with human assessments. The findings suggest that GPT-4 can improve formative assessment by providing timely, personalized feedback to support student learning. This study highlights the importance of both an empirical and theoretical understanding of the gap between traditional assessment methods and educational practices and AI-enhanced approaches-particularly the importance of the ability of chain-of-thought prompting, how AI can scaffold tutoring, and assessment practices. However, continuous refinement and human oversight remain crucial to ensure the effective and responsible integration of LLM's like GPT-4 into educational settings.

摘要

人工智能(AI)在教育领域日益深入的融合为创新评估方法铺平了道路。本研究探讨了大型语言模型(LLM)GPT-4在挪威教育环境下的医学考试以及形成性和总结性评估中的能力。本研究基于我们之前的工作,旨在通过评估GPT-4在全面的医学多项选择题考试中的表现,探索人工智能,特别是GPT-4如何能够改进评估实践。先前的研究表明,大型语言模型在医学教育中具有一定潜力,但尚未具体研究GPT-4如何能够加强医学教育中的形成性和总结性评估。因此,我的研究通过考察GPT-4在挪威医学教育中的形成性和总结性评估能力,为填补当前知识空白做出了贡献。为此,采用了案例研究设计,主要数据来源包括110道考试题目、10道盲评考试题目以及医学领域的2个患者病例。这项内在案例研究的结果显示,GPT-4在总结性评估中表现出色,能够很好地处理挪威医学语言。此外,GPT-4对综合学生考试(如患者病例)表现出可靠的评估能力,因此与人工评估高度一致。研究结果表明,GPT-4可以通过提供及时、个性化的反馈来支持学生学习,从而改进形成性评估。本研究强调了对传统评估方法与教育实践以及人工智能增强方法之间差距进行实证和理论理解的重要性,特别是思维链提示能力、人工智能如何辅助辅导以及评估实践的重要性。然而,持续完善和人工监督对于确保像GPT-4这样的大型语言模型有效且负责任地融入教育环境仍然至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edd/12018347/abb76189b334/fmed-12-1441747-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edd/12018347/24698b00c60d/fmed-12-1441747-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edd/12018347/abb76189b334/fmed-12-1441747-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edd/12018347/24698b00c60d/fmed-12-1441747-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8edd/12018347/abb76189b334/fmed-12-1441747-g002.jpg

相似文献

1
GPT-4's capabilities for formative and summative assessments in Norwegian medicine exams-an intrinsic case study in the early phase of intervention.GPT-4在挪威医学考试中的形成性和总结性评估能力——干预早期阶段的一项内在案例研究。
Front Med (Lausanne). 2025 Apr 10;12:1441747. doi: 10.3389/fmed.2025.1441747. eCollection 2025.
2
Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.使用心身医学考试问题评估 ChatGPT 对布鲁姆教育目标分类法的掌握程度:混合方法研究。
J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.
3
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.比较流行的大语言模型在国家医学考试委员会样题上的表现。
Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.
4
Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较:观察性研究。
JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.
5
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
6
Quality assurance and validity of AI-generated single best answer questions.人工智能生成的最佳单一答案问题的质量保证与有效性
BMC Med Educ. 2025 Feb 25;25(1):300. doi: 10.1186/s12909-025-06881-w.
7
Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用:比较研究。
J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.
8
Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study.多伦多大学家庭医学住院医师进展测试中住院医师与人工智能聊天机器人表现的评估:比较研究
JMIR Med Educ. 2023 Sep 19;9:e50514. doi: 10.2196/50514.
9
Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.人工智能医学教育的进展:评估ChatGPT在不同主题和难度级别的美国医师执照考试(USMLE)风格问题上的表现。
Cureus. 2024 Dec 24;16(12):e76309. doi: 10.7759/cureus.76309. eCollection 2024 Dec.
10
Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.评估ChatGPT-4在鉴别诊断中识别最终诊断的准确性与医生的准确性比较:诊断病例的实验研究
JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.

引用本文的文献

1
Artificial intelligence, health empowerment, and the general practitioner scheme.人工智能、健康赋权与全科医生计划。
Digit Health. 2025 Jul 29;11:20552076251365006. doi: 10.1177/20552076251365006. eCollection 2025 Jan-Dec.
2
Large language models in medical education: a comparative cross-platform evaluation in answering histological questions.医学教育中的大语言模型:回答组织学问题的比较性跨平台评估
Med Educ Online. 2025 Dec;30(1):2534065. doi: 10.1080/10872981.2025.2534065. Epub 2025 Jul 12.

本文引用的文献

1
AI-generated and doctors' answers to health-related questions.人工智能生成的以及医生对健康相关问题的回答。
Tidsskr Nor Laegeforen. 2025 Feb 10;145(2). doi: 10.4045/tidsskr.24.0402. Print 2025 Feb 11.
2
Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响:一项随机临床试验。
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
3
LLM-based automatic short answer grading in undergraduate medical education.基于 LLM 的本科医学教育自动简答题评分。
BMC Med Educ. 2024 Sep 27;24(1):1060. doi: 10.1186/s12909-024-06026-5.
4
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现:系统评价和荟萃分析。
BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.
5
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
6
Large language models for generating medical examinations: systematic review.生成医学检查的大型语言模型:系统评价。
BMC Med Educ. 2024 Mar 29;24(1):354. doi: 10.1186/s12909-024-05239-y.
7
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。
Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.
8
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
9
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
10
The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research.反馈的力量再审视:教育反馈研究的元分析
Front Psychol. 2020 Jan 22;10:3087. doi: 10.3389/fpsyg.2019.03087. eCollection 2019.