Department of Ophthalmology, Beyoglu Eye Training and Research Hospital, University of Health Sciences, 34420, Istanbul, Turkey.
Department of Ophthalmology, Sancaktepe Prof. Dr. Ilhan Varank Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.
Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.
This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.
The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.
Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.
ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.
本横断面研究旨在评估 ChatGPT 在回答“住院医师培训发展考试”问题时的成功率,并将这些结果与眼科住院医师的表现进行比较。
将 75 个考试问题分为九个部分和三个难度级别,提供给 ChatGPT。记录回答和解释。分析解释的可读性和复杂性,并使用名为 Readable 的程序记录弗莱什阅读舒适度(FRE)得分(0-100)。根据资历将住院医师分为四组。分别比较住院医师与 ChatGPT 的总体和资历特定成功率。
在 69 个问题中,ChatGPT 正确回答了 37 个(53.62%)。成功率最高的是晶状体和白内障(77.77%),最低的是小儿眼科和斜视(0.00%)。在 789 名住院医师中,总体准确率为 50.37%。1 至 4 年级住院医师的准确率分别为 43.49%、51.30%、54.91%和 60.05%。ChatGPT 在住院医师中排名第 292 位。从难度级别来看,有 11 个问题是简单的,44 个是中等的,14 个是困难的。ChatGPT 对每个级别的准确率分别为 63.63%、54.54%和 42.85%。ChatGPT 生成的回答的平均 FRE 得分为 27.56±12.40。
ChatGPT 在住院医师考试中正确回答了 53.6%的问题。ChatGPT 的平均成功率低于三年级住院医师。ChatGPT 提供的回答的可读性较低,难以理解。随着难度的增加,ChatGPT 的成功率下降。可以预见,随着更多信息加载到 ChatGPT 中,这些结果将会发生变化。