Suppr超能文献

探索人工智能在土耳其骨科进展考试中的作用。

Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

作者信息

Ayik Gokhan, Kolac Ulas Can, Aksoy Taha, Yilmaz Abdurrahman, Sili Mazlum Veysel, Tokgozoglu Mazhar, Huri Gazi

机构信息

Department of Orthopedics and Traumatology, Yuksek Ihtisas University Faculty of Medicine, Ankara, Türkiye.

Department of Orthopedics and Traumatology, Hacettepe University Faculty Of Medicine, Ankara, Türkiye.

出版信息

Acta Orthop Traumatol Turc. 2025 Mar 17;59(1):18-26. doi: 10.5152/j.aott.2025.24090.

Abstract

OBJECTIVE

The aim of this study was to evaluate and compare the performance of the artificial intelligence (AI) models ChatGPT-3.5, ChatGPT-4, and Gemini on the Turkish Specialization Training and Development Examination (UEGS) to determine their utility in medical education and their potential to improve patient care.

METHODS

This retrospective study analyzed responses of ChatGPT-3.5, ChatGPT-4, and Gemini to 1000 true or false questions from UEGS administered over 5 years (2018-2023). Questions, encompassing 9 orthopedic subspecialties, were categorized by 2 independent residents, with discrepancies resolved by a senior author. Artificial intelligence models were restarted for each query to prevent data retention. Performance was evaluated by calculating net scores and comparing them to orthopedic resident scores obtained from the Turkish Orthopedics and Traumatology Education Council (TOTEK) database. Statistical analyses included chi-squared tests, Bonferroni-adjusted Z tests, Cochran's Q test, and receiver operating characteristic (ROC) analysis to determine the optimal question length for AI accuracy. All AI responses were generated independently without retaining prior information.

RESULTS

Significant di!erences in AI tool accuracy were observed across di!erent years and subspecialties (P < .001). ChatGPT-4 consistently outperformed other models, achieving the highest overall accuracy (95% in specific subspecialties). Notably, ChatGPT-4 demonstrated superior performance in Basic and General Orthopedics and Foot and Ankle Surgery, while Gemini and ChatGPT-3.5 showed variability in accuracy across topics and years. Receiver operating characteristic analysis revealed a significant relationship between shorter letter counts and higher accuracy for ChatGPT-4 (P=.002). ChatGPT-4 showed significant negative correlations between letter count and accuracy across all years (r="0.099, P=.002), outperformed residents in basic and general orthopedics (P=.015) and trauma (P=.012), unlike other AI models.

CONCLUSION

The findings underscore the advancing role of AI in the medical field, with ChatGPT-4 demonstrating significant potential as a tool for medical education and clinical decision-making. Continuous evaluation and refinement of AI technologies are essential to enhance their educational and clinical impact.

摘要

目的

本研究旨在评估和比较人工智能(AI)模型ChatGPT-3.5、ChatGPT-4和Gemini在土耳其专科培训与发展考试(UEGS)中的表现,以确定它们在医学教育中的效用以及改善患者护理的潜力。

方法

这项回顾性研究分析了ChatGPT-3.5、ChatGPT-4和Gemini对5年(2018 - 2023年)期间UEGS的1000道是非题的回答。涵盖9个骨科亚专业的问题由2名独立的住院医师进行分类,如有差异则由一位资深作者解决。每个查询都重新启动人工智能模型以防止数据保留。通过计算净分数并将其与从土耳其骨科学与创伤学教育委员会(TOTEK)数据库获得的骨科住院医师分数进行比较来评估表现。统计分析包括卡方检验、Bonferroni校正的Z检验、 Cochr an's Q检验和受试者工作特征(ROC)分析,以确定人工智能准确性的最佳问题长度。所有人工智能回答均独立生成,不保留先前信息。

结果

在不同年份和亚专业中观察到人工智能工具准确性存在显著差异(P <.001)。ChatGPT-4始终优于其他模型,总体准确率最高(某些亚专业中为95%)。值得注意的是,ChatGPT-4在基础和普通骨科以及足踝外科表现出色,而Gemini和ChatGPT-3.5在不同主题和年份的准确性存在差异。受试者工作特征分析显示,ChatGPT-4的字母计数越少与准确性越高之间存在显著关系(P =.002)。ChatGPT-4在所有年份中字母计数与准确性之间均显示出显著的负相关(r = 0.099,P =.002),在基础和普通骨科(P =.015)以及创伤科(P =.012)方面优于住院医师,这与其他人工智能模型不同。

结论

研究结果强调了人工智能在医学领域的不断推进作用,ChatGPT-4作为医学教育和临床决策工具具有巨大潜力。持续评估和改进人工智能技术对于增强其教育和临床影响至关重要。

相似文献

1
Exploring the role of artificial intelligence in Turkish orthopedic progression exams.
Acta Orthop Traumatol Turc. 2025 Mar 17;59(1):18-26. doi: 10.5152/j.aott.2025.24090.
2
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
6
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.
7
Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.
Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.
10
AI-generated questions for urological competency assessment: a prospective educational study.
BMC Med Educ. 2025 Apr 25;25(1):611. doi: 10.1186/s12909-025-07202-x.

引用本文的文献

1
Correspondence on "exploring the role of artificial intelligence in turkish orthopedic progression exams".
Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):230-231. doi: 10.5152/j.aott.2025.25418.

本文引用的文献

1
Can generative artificial intelligence pass the orthopaedic board examination?
J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.
4
Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard.
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2137-2143. doi: 10.1007/s00405-023-08381-3. Epub 2023 Dec 20.
5
Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
Int Orthop. 2024 Jan;48(1):151-158. doi: 10.1007/s00264-023-06034-y. Epub 2023 Nov 15.
6
The future landscape of large language models in medicine.
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
7
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
8
ChatGPT performance in the medical specialty exam: An observational study.
Medicine (Baltimore). 2023 Aug 11;102(32):e34673. doi: 10.1097/MD.0000000000034673.
9
Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information.
Knee Surg Sports Traumatol Arthrosc. 2023 Nov;31(11):5190-5198. doi: 10.1007/s00167-023-07529-2. Epub 2023 Aug 8.
10
Assessing ChatGPT's ability to pass the FRCS orthopaedic part A exam: A critical analysis.
Surgeon. 2023 Oct;21(5):263-266. doi: 10.1016/j.surge.2023.07.001. Epub 2023 Jul 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验