初级住院医师与ChatGPT在病史采集和病历记录客观结构化临床考试（OSCE）中的表现比较：开发与可用性研究

Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.

作者信息

Huang Ting-Yun, Hsieh Pei Hsing, Chang Yung-Chun

机构信息

Shuang-Ho Hospital, Taipei Medical University, New Taipei City, Taiwan.

Graduate Institute of Data Science, Taipei Medical University, Zhonghe District, New Taipei City, Taiwan.

出版信息

JMIR Med Educ. 2024 Nov 21;10:e59902. doi: 10.2196/59902.

DOI:10.2196/59902

PMID:39622713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11612517/

Abstract

BACKGROUND

This study explores the cutting-edge abilities of large language models (LLMs) such as ChatGPT in medical history taking and medical record documentation, with a focus on their practical effectiveness in clinical settings-an area vital for the progress of medical artificial intelligence.

OBJECTIVE

Our aim was to assess the capability of ChatGPT versions 3.5 and 4.0 in performing medical history taking and medical record documentation in simulated clinical environments. The study compared the performance of nonmedical individuals using ChatGPT with that of junior medical residents.

METHODS

A simulation involving standardized patients was designed to mimic authentic medical history-taking interactions. Five nonmedical participants used ChatGPT versions 3.5 and 4.0 to conduct medical histories and document medical records, mirroring the tasks performed by 5 junior residents in identical scenarios. A total of 10 diverse scenarios were examined.

RESULTS

Evaluation of the medical documentation created by laypersons with ChatGPT assistance and those created by junior residents was conducted by 2 senior emergency physicians using audio recordings and the final medical records. The assessment used the Objective Structured Clinical Examination benchmarks in Taiwan as a reference. ChatGPT-4.0 exhibited substantial enhancements over its predecessor and met or exceeded the performance of human counterparts in terms of both checklist and global assessment scores. Although the overall quality of human consultations remained higher, ChatGPT-4.0's proficiency in medical documentation was notably promising.

CONCLUSIONS

The performance of ChatGPT 4.0 was on par with that of human participants in Objective Structured Clinical Examination evaluations, signifying its potential in medical history and medical record documentation. Despite this, the superiority of human consultations in terms of quality was evident. The study underscores both the promise and the current limitations of LLMs in the realm of clinical practice.

摘要

背景

本研究探讨了ChatGPT等大语言模型在病史采集和病历记录方面的前沿能力，重点关注其在临床环境中的实际有效性，这是医学人工智能发展的关键领域。

目的

我们的目的是评估ChatGPT 3.5版和4.0版在模拟临床环境中进行病史采集和病历记录的能力。该研究比较了使用ChatGPT的非医学专业人员与初级住院医师的表现。

方法

设计了一项涉及标准化病人的模拟实验，以模拟真实的病史采集互动。五名非医学参与者使用ChatGPT 3.5版和4.0版进行病史采集和病历记录，模仿五名初级住院医师在相同场景下执行的任务。共检查了10种不同的场景。

结果

两名资深急诊科医生通过录音和最终病历，对在ChatGPT协助下由非专业人员创建的医疗文档和初级住院医师创建的医疗文档进行了评估。评估以台湾的客观结构化临床考试基准为参考。ChatGPT-4.0相对于其前身有了显著改进，在清单和整体评估分数方面达到或超过了人类同行的表现。虽然人类问诊的整体质量仍然更高，但ChatGPT-4.0在医疗文档方面的熟练程度明显很有前景。

结论

ChatGPT 4.0在客观结构化临床考试评估中的表现与人类参与者相当，这表明其在病史和病历记录方面的潜力。尽管如此，人类问诊在质量方面的优势是显而易见的。该研究强调了大语言模型在临床实践领域的前景和当前的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b84c/11612517/1585bdbc0ea8/mededu-v10-e59902-g001.jpg

相似文献

Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.初级住院医师与ChatGPT在病史采集和病历记录客观结构化临床考试（OSCE）中的表现比较：开发与可用性研究

JMIR Med Educ. 2024 Nov 21;10:e59902. doi: 10.2196/59902.

Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study.利用 ChatGPT-4 从医患对话的音频记录中创建结构化的医疗记录：比较研究。

J Med Internet Res. 2024 Apr 22;26:e54419. doi: 10.2196/54419.

A Language Model-Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study.基于语言模型的模拟患者与自动化反馈的病史采集：前瞻性研究。

JMIR Med Educ. 2024 Aug 16;10:e59213. doi: 10.2196/59213.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Assessing the Ability of a Large Language Model to Score Free-Text Medical Student Clinical Notes: Quantitative Study.评估大型语言模型对自由文本医学生临床笔记评分的能力：定量研究。

JMIR Med Educ. 2024 Jul 25;10:e56342. doi: 10.2196/56342.

Examination to assess the clinical examination and documentation of spine pathology among orthopedic residents.评估骨科住院医师脊柱病理临床检查和记录的检查。

Spine J. 2017 Dec;17(12):1830-1836. doi: 10.1016/j.spinee.2017.06.009. Epub 2017 Jun 13.

Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination.医学检查中人工智能的进步：从ChatGPT 3.5到ChatGPT 4.0在英国皇家外科医学院创伤与骨科考试中的飞跃。

Surgeon. 2025 Feb;23(1):13-17. doi: 10.1016/j.surge.2024.11.008. Epub 2024 Nov 29.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Artificial Intelligence as a Discriminator of Competence in Urological Training: Are We There?人工智能作为泌尿外科培训中能力的鉴别器：我们做到了吗？

J Urol. 2025 Apr;213(4):504-511. doi: 10.1097/JU.0000000000004357. Epub 2024 Dec 9.

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现：比较分析。

JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.

引用本文的文献

Utility of Generative Artificial Intelligence for Japanese Medical Interview Training: Randomized Crossover Pilot Study.生成式人工智能在日本医学面试培训中的效用：随机交叉试点研究。

JMIR Med Educ. 2025 Aug 1;11:e77332. doi: 10.2196/77332.

AI Scribes in Health Care: Balancing Transformative Potential With Responsible Integration.医疗保健领域的人工智能抄写员：平衡变革潜力与负责任的整合

JMIR Med Inform. 2025 Aug 1;13:e80898. doi: 10.2196/80898.

Evaluating the Use of ChatGPT 3.5 and Bard as Self-Assessment Tools for Short Answer Questions in Undergraduate Ophthalmology.评估ChatGPT 3.5和Bard作为本科眼科简答题自我评估工具的使用情况。

Cureus. 2025 Jun 18;17(6):e86288. doi: 10.7759/cureus.86288. eCollection 2025 Jun.

Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics.ChatGPT模型迭代在急诊科诊断中的初步评估。

Sci Rep. 2025 Mar 26;15(1):10426. doi: 10.1038/s41598-025-95233-1.

本文引用的文献

Can large language models reason about medical questions?大型语言模型能对医学问题进行推理吗？

Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.ChatGPT-4与日本内科住院医师在普通内科培训考试中的表现比较：比较研究

JMIR Med Educ. 2023 Dec 6;9:e52202. doi: 10.2196/52202.

A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。

NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.

Comparison of History of Present Illness Summaries Generated by a Chatbot and Senior Internal Medicine Residents.聊天机器人与内科住院医师生成的现病史摘要比较

JAMA Intern Med. 2023 Sep 1;183(9):1026-1027. doi: 10.1001/jamainternmed.2023.2561.

Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations.聊天机器人与医学生在自由应答临床推理考试中的表现对比

JAMA Intern Med. 2023 Sep 1;183(9):1028-1030. doi: 10.1001/jamainternmed.2023.2909.

Large language models encode clinical knowledge.大语言模型编码临床知识。

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

Practical Applications of ChatGPT in Undergraduate Medical Education.ChatGPT在本科医学教育中的实际应用

J Med Educ Curric Dev. 2023 May 24;10:23821205231178449. doi: 10.1177/23821205231178449. eCollection 2023 Jan-Dec.

ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology.ChatGPT 在妇产科虚拟客观结构化临床考试中优于人类考生。

Am J Obstet Gynecol. 2023 Aug;229(2):172.e1-172.e12. doi: 10.1016/j.ajog.2023.04.020. Epub 2023 Apr 22.

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险

N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.

The rise of ChatGPT: Exploring its potential in medical education.ChatGPT 的兴起：探索其在医学教育中的潜力。

Anat Sci Educ. 2024 Jul-Aug;17(5):926-931. doi: 10.1002/ase.2270. Epub 2023 Mar 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

初级住院医师与ChatGPT在病史采集和病历记录客观结构化临床考试（OSCE）中的表现比较：开发与可用性研究

Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献