Suppr超能文献

Chatgpt 在眼科考试中的表现;人类与 AI 相比。

Performance of Chatgpt in ophthalmology exam; human versus AI.

机构信息

Department of Ophthalmology, Beyoglu Eye Training and Research Hospital, University of Health Sciences, 34420, Istanbul, Turkey.

Department of Ophthalmology, Sancaktepe Prof. Dr. Ilhan Varank Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.

出版信息

Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.

Abstract

PURPOSE

This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.

METHODS

The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.

RESULTS

Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.

CONCLUSION

ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.

摘要

目的

本横断面研究旨在评估 ChatGPT 在回答“住院医师培训发展考试”问题时的成功率,并将这些结果与眼科住院医师的表现进行比较。

方法

将 75 个考试问题分为九个部分和三个难度级别,提供给 ChatGPT。记录回答和解释。分析解释的可读性和复杂性,并使用名为 Readable 的程序记录弗莱什阅读舒适度(FRE)得分(0-100)。根据资历将住院医师分为四组。分别比较住院医师与 ChatGPT 的总体和资历特定成功率。

结果

在 69 个问题中,ChatGPT 正确回答了 37 个(53.62%)。成功率最高的是晶状体和白内障(77.77%),最低的是小儿眼科和斜视(0.00%)。在 789 名住院医师中,总体准确率为 50.37%。1 至 4 年级住院医师的准确率分别为 43.49%、51.30%、54.91%和 60.05%。ChatGPT 在住院医师中排名第 292 位。从难度级别来看,有 11 个问题是简单的,44 个是中等的,14 个是困难的。ChatGPT 对每个级别的准确率分别为 63.63%、54.54%和 42.85%。ChatGPT 生成的回答的平均 FRE 得分为 27.56±12.40。

结论

ChatGPT 在住院医师考试中正确回答了 53.6%的问题。ChatGPT 的平均成功率低于三年级住院医师。ChatGPT 提供的回答的可读性较低,难以理解。随着难度的增加,ChatGPT 的成功率下降。可以预见,随着更多信息加载到 ChatGPT 中,这些结果将会发生变化。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验