Department of Oral and Maxillofacial Surgery, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität Zu Berlin, and Berlin Institute of Health, Berlin, Germany.
Department of Plastic Surgery and Hand Surgery, Klinikum Rechts Der Isar, Technical University of Munich, Munich, Germany.
Sci Rep. 2024 Jun 12;14(1):13553. doi: 10.1038/s41598-024-63997-7.
ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.
ChatGPT 作为一款多功能 AI 聊天机器人,在医学领域具有潜在应用,备受关注。尽管在临床管理和患者教育等领域有一些有趣的初步发现,但我们对全面了解 ChatGPT 能力的机会和限制仍存在很大的知识差距,尤其是在医学考试和教育方面。从 Amboss 题库中提取了总共 n = 2729 个 USMLE Step 1 练习题。排除 352 个基于图像的问题后,总共 2377 个基于文本的问题进一步进行分类并手动输入到 ChatGPT 中,并记录其回答。根据问题难度、类别以及特定信号词和短语的内容,分析了 ChatGPT 的整体表现。ChatGPT 在总共 n = 2377 个从 Amboss 在线题库获得的 USMLE Step 1 备考问题中的整体准确率为 55.8%。它表现出问题难度与表现之间存在显著的负相关关系,r = -0.306;p < 0.001,在不同难度级别的问题中,与人类用户的准确率相当。值得注意的是,ChatGPT 在血清学相关问题上表现更好(61.1%比 53.8%;p = 0.005),但在心电图相关内容方面表现不佳(42.9%比 55.6%;p = 0.021)。ChatGPT 在与病理生理学相关的问题中表现出显著更差的性能(信号短语=“最有可能/最可能的原因是什么”)。ChatGPT 在各种问题类别和难度级别上的表现一致。这些发现强调了需要进一步研究以探索 ChatGPT 在医学考试和教育中的潜力和限制。
Front Med (Lausanne). 2023-12-13
Sci Rep. 2023-10-1
Front Med (Lausanne). 2025-4-28
Philos Trans A Math Phys Eng Sci. 2024-4-15
Sci Rep. 2023-12-8
J Pers Med. 2023-7-31
Dtsch Arztebl Int. 2023-5-30