Leis Angela, Mayer Miguel-Angel, Mayer Alex
Hospital del Mar Research Institute, Barcelona, Spain.
Hospital del Mar, Barcelona, Spain.
Stud Health Technol Inform. 2025 May 15;327:1054-1058. doi: 10.3233/SHTI250544.
The growing use of Artificial Intelligence (AI) in healthcare, particularly focusing on the potential of generative AI models like ChatGPT-4 is a trending topic. The study examines how ChatGPT-4 performed on the national Medicine Residency exam in Spain, a highly selective test for accessing the medical specialization training program called MIR. ChatGPT-4 answered 210 questions, including 25 that required image interpretation. The chatbot correctly answered 150 out of 200 questions, achieving an estimated ranking of around 1900-2300 out of 11,577 candidates. This performance would allow access to most medical specialties in Spain. No significant differences were found between questions requiring image analysis and those that did not, but ChatGPT struggled with more difficult questions, showing a higher error rate for complex problems just like a human being. Despite its potential as an educational and problem-solving tool, the study highlights ChatGPT's limitations, including occasional "AI hallucinations" (incorrect or nonsensical answers) and variability in responses when questions were repeated. The study emphasizes that while AI tools such as ChatGPT can assist in education and medical tasks, they cannot replace qualified healthcare professionals, and their output requires careful verification.
人工智能(AI)在医疗保健领域的应用日益广泛,尤其是生成式AI模型如ChatGPT-4的潜力成为热门话题。该研究考察了ChatGPT-4在西班牙国家医学住院医师考试中的表现,这是一项进入名为MIR的医学专科培训项目的高选拔性考试。ChatGPT-4回答了210道问题,其中包括25道需要图像解读的问题。这个聊天机器人在200道问题中正确回答了150道,在11577名考生中估计排名约为1900 - 2300名。这样的成绩在西班牙可以进入大多数医学专科。在需要图像分析的问题和不需要图像分析的问题之间未发现显著差异,但ChatGPT在较难的问题上表现吃力,对于复杂问题的错误率和人类一样更高。尽管ChatGPT有作为教育和解决问题工具的潜力,但该研究强调了它的局限性,包括偶尔出现的“AI幻觉”(错误或无意义的答案)以及重复提问时回答的变异性。该研究强调,虽然像ChatGPT这样的AI工具可以协助教育和医疗任务,但它们无法取代合格的医疗保健专业人员,其输出结果需要仔细验证。