大语言模型在医学考试中检测禁忌选项的高级推理能力。

The Advanced Reasoning Capabilities of Large Language Models for Detecting Contraindicated Options in Medical Exams.

作者信息

Yano Yuichiro, Ohashi Mizuki, Miyagami Taiju, Mori Hirotake, Nishizaki Yuji, Daida Hiroyuki, Naito Toshio

机构信息

Department of General Medicine, Juntendo University Faculty of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan, 81 3-3813-3111.

AI Incubation Farm, Juntendo University Faculty of Medicine, Tokyo, Japan.

出版信息

JMIR Med Inform. 2025 May 12;13:e68527. doi: 10.2196/68527.

DOI:10.2196/68527

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12088613/

Abstract

Enhancing clinical reasoning and reducing diagnostic errors are essential in medical practice; OpenAI-o1, with advanced reasoning capabilities, performed better than GPT-4 on 15 Japanese National Medical Licensing Examination questions (accuracy: 100% vs 80%; contraindicated option detection: 87% vs 73%), though findings are preliminary due to the small sample size.

摘要

在医学实践中，增强临床推理能力和减少诊断错误至关重要；具有先进推理能力的OpenAI-o1在15道日本国家医师资格考试题目上的表现优于GPT-4（准确率：100%对80%；禁忌选项检测：87%对73%），不过由于样本量小，研究结果尚属初步。

相似文献

1

The Advanced Reasoning Capabilities of Large Language Models for Detecting Contraindicated Options in Medical Exams.大语言模型在医学考试中检测禁忌选项的高级推理能力。

JMIR Med Inform. 2025 May 12;13:e68527. doi: 10.2196/68527.

2

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试（USMLE）问题上高精度背后的隐藏挑战：观察性研究。

J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

3

An Evaluation of the Performance of OpenAI-o1 and GPT-4o in the Japanese National Examination for Physical Therapists.OpenAI-o1和GPT-4o在日本物理治疗师国家考试中的表现评估

Cureus. 2025 Jan 6;17(1):e76989. doi: 10.7759/cureus.76989. eCollection 2025 Jan.

4

Influence of Model Evolution and System Roles on ChatGPT's Performance in Chinese Medical Licensing Exams: Comparative Study.模型演进和系统角色对 ChatGPT 在中文医师资格考试中表现的影响：对比研究。

JMIR Med Educ. 2024 Aug 13;10:e52784. doi: 10.2196/52784.

5

Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions.ChatGPT-4o在日本医师执照考试中的表现：纯文本和基于图像问题的准确性评估。

JMIR Med Educ. 2024 Dec 24;10:e63129. doi: 10.2196/63129.

6

Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.大型语言模型在外科检查问题中的视觉能力基准测试

J Surg Educ. 2025 Apr;82(4):103442. doi: 10.1016/j.jsurg.2025.103442. Epub 2025 Feb 9.

7

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

8

Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性：使用日本国家医学考试的比较研究。

Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.

9

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study.GPT-4V（视觉）在日本国家医师资格考试中的能力：评估研究。

JMIR Med Educ. 2024 Mar 12;10:e54393. doi: 10.2196/54393.

10

Performance Assessment of GPT 4.0 on the Japanese Medical Licensing Examination.GPT 4.0在日本医师执照考试中的性能评估。

Curr Med Sci. 2024 Dec;44(6):1148-1154. doi: 10.1007/s11596-024-2932-9. Epub 2024 Oct 26.

引用本文的文献

1

AI-Based EMG Reporting: A Randomized Controlled Trial.基于人工智能的肌电图报告：一项随机对照试验。

J Neurol. 2025 Aug 22;272(9):586. doi: 10.1007/s00415-025-13261-3.

本文引用的文献

1

OpenAI o1-Preview vs. ChatGPT in Healthcare: A New Frontier in Medical AI Reasoning.医疗领域中OpenAI的o1-预览版与ChatGPT对比：医学人工智能推理的新前沿

Cureus. 2024 Oct 1;16(10):e70640. doi: 10.7759/cureus.70640. eCollection 2024 Oct.

2

Overconfidence as a cause of diagnostic error in medicine.过度自信作为医学诊断错误的一个原因。

Am J Med. 2008 May;121(5 Suppl):S2-23. doi: 10.1016/j.amjmed.2008.01.001.

3

Educational strategies to promote clinical diagnostic reasoning.促进临床诊断推理的教育策略。

N Engl J Med. 2006 Nov 23;355(21):2217-25. doi: 10.1056/NEJMra054782.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。