文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

The role of artificial intelligence in medical education: an evaluation of Large Language Models (LLMs) on the Turkish Medical Specialty Training Entrance Exam.

作者信息

Koçak Murat, Oğuz Ali Kemal, Akçalı Zafer

机构信息

Department of Medical Informatics, Faculty of Medicine, Baskent University, Ankara, Turkey.

Department of Internal Medicine, Faculty of Medicine, Baskent University, Ankara, Turkey.

出版信息

BMC Med Educ. 2025 Apr 25;25(1):609. doi: 10.1186/s12909-025-07148-0.


DOI:10.1186/s12909-025-07148-0
PMID:40281510
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12023555/
Abstract

OBJECTIVE: To evaluate the performance of advanced large language models (LLMs)-OpenAI-ChatGPT 4, Google AI-Gemini 1.5 Pro, Cohere-Command R + and Meta AI-Llama 3 70B on questions from the Turkish Medical Specialty Training Entrance Exam (2021, 1st semester) and analyze their answers for user interpretability in languages other than English. METHODS: The study used questions from the Basic Medical Sciences and Clinical Medical Sciences exams of the Turkish Medical Specialty Training Entrance Exam held on March 21, 2021. The 240 questions were presented to the LLMs in Turkish, and their responses were evaluated based on the official answers published by the Student Selection and Placement Centre. RESULTS: ChatGPT 4 was the best-performing model with an overall accuracy of 88.75%. Llama 3 70B followed closely with 79.17% accuracy. Gemini 1.5 Pro achieved 78.13% accuracy, while Command R + lagged with 50% accuracy. ChatGPT 4 demonstrated strengths in both basic and clinical medical science questions. Performance varied across question difficulties, with ChatGPT 4 maintaining high accuracy even on the most challenging questions. CONCLUSIONS: GPT-4 and Llama 3 70B achieved satisfactory results on the Turkish Medical Specialty Training Entrance Exam, demonstrating their potential as safe sources for basic medical sciences and clinical medical sciences knowledge in languages other than English. These LLMs could be valuable resources for medical education and clinical support in non-English speaking areas. However, Gemini 1.5 Pro and Command R + show potential but need significant improvement to compete with the best-performing models.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/db03a5a50f8f/12909_2025_7148_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/83276dbdcb38/12909_2025_7148_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/952ad95404ef/12909_2025_7148_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/db03a5a50f8f/12909_2025_7148_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/83276dbdcb38/12909_2025_7148_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/952ad95404ef/12909_2025_7148_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15e7/12023555/db03a5a50f8f/12909_2025_7148_Fig3_HTML.jpg

相似文献

[1]
The role of artificial intelligence in medical education: an evaluation of Large Language Models (LLMs) on the Turkish Medical Specialty Training Entrance Exam.

BMC Med Educ. 2025-4-25

[2]
Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.

BMC Cancer. 2025-2-4

[3]
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.

J Periodontal Res. 2025-2

[4]
Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

BMC Med Educ. 2025-2-10

[5]
Factors Associated With the Accuracy of Large Language Models in Basic Medical Science Examinations: Cross-Sectional Study.

JMIR Med Educ. 2025-1-13

[6]
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.

Int J Med Inform. 2025-1

[7]
Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.

Cureus. 2025-1-11

[8]
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.

JMIR Form Res. 2024-12-17

[9]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

[10]
Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.

JMIR Med Educ. 2025-4-10

引用本文的文献

[1]
AI Foundations in China's Medical Physiology Education: Pedagogical Practices and Systemic Challenges.

Adv Med Educ Pract. 2025-8-15

本文引用的文献

[1]
Development trends and knowledge framework of artificial intelligence (AI) applications in oncology by years: a bibliometric analysis from 1992 to 2022.

Discov Oncol. 2024-10-16

[2]
Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.

Surgery. 2024-4

[3]
ChatGPT sits the DFPH exam: large language model performance and potential to support public health learning.

BMC Med Educ. 2024-1-11

[4]
Reshaping medical education: Performance of ChatGPT on a PES medical examination.

Cardiol J. 2024

[5]
Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology.

Front Oncol. 2023-9-14

[6]
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.

JMIR Med Educ. 2023-9-28

[7]
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023-6-29

[8]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索