文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT 和 Bard 在 FRCOphth 官方第一部分实践问题上的表现。

Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.

机构信息

Department of Medicine, Barking Havering and Redbridge University Hospitals NHS Trust, London, UK

Department of Anaesthetics, Princess Alexandra Hospital, Harlow, UK.

出版信息

Br J Ophthalmol. 2024 Sep 20;108(10):1379-1383. doi: 10.1136/bjo-2023-324091.


DOI:10.1136/bjo-2023-324091
PMID:37932006
Abstract

BACKGROUND: Chat Generative Pre-trained Transformer (ChatGPT), a large language model by OpenAI, and Bard, Google's artificial intelligence (AI) chatbot, have been evaluated in various contexts. This study aims to assess these models' proficiency in the part 1 Fellowship of the Royal College of Ophthalmologists (FRCOphth) Multiple Choice Question (MCQ) examination, highlighting their potential in medical education. METHODS: Both models were tested on a sample question bank for the part 1 FRCOphth MCQ exam. Their performances were compared with historical human performance on the exam, focusing on the ability to comprehend, retain and apply information related to ophthalmology. We also tested it on the book 'MCQs for FRCOpth part 1', and assessed its performance across subjects. RESULTS: ChatGPT demonstrated a strong performance, surpassing historical human pass marks and examination performance, while Bard underperformed. The comparison indicates the potential of certain AI models to match, and even exceed, human standards in such tasks. CONCLUSION: The results demonstrate the potential of AI models, such as ChatGPT, in processing and applying medical knowledge at a postgraduate level. However, performance varied among different models, highlighting the importance of appropriate AI selection. The study underlines the potential for AI applications in medical education and the necessity for further investigation into their strengths and limitations.

摘要

背景:OpenAI 的大型语言模型 Chat Generative Pre-trained Transformer(ChatGPT)和谷歌的人工智能(AI)聊天机器人 Bard 在各种场景下都得到了评估。本研究旨在评估这些模型在皇家眼科医师学院(FRCOphth)第 1 部分多项选择题(MCQ)考试中的熟练程度,强调其在医学教育中的潜在应用。

方法:我们对第 1 部分 FRCOphth MCQ 考试的样本题库进行了模型测试。我们重点关注理解、保留和应用与眼科相关信息的能力,将其表现与该考试的历史人类表现进行了比较。我们还在《FRCOpth 第 1 部分 MCQs》一书中对其进行了测试,并评估了其在不同科目上的表现。

结果:ChatGPT 表现出色,超越了历史人类及格分数和考试表现,而 Bard 的表现则欠佳。比较结果表明,某些 AI 模型在完成此类任务时具有匹配甚至超越人类标准的潜力。

结论:研究结果表明,人工智能模型(如 ChatGPT)在处理和应用研究生水平的医学知识方面具有潜力。然而,不同模型之间的性能存在差异,这突出了选择合适 AI 模型的重要性。本研究强调了 AI 在医学教育中的应用潜力,以及进一步研究其优势和局限性的必要性。

相似文献

[1]
Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.

Br J Ophthalmol. 2024-9-20

[2]
Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.

Int Orthop. 2024-8

[3]
The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance.

J Diabetes Sci Technol. 2025-5

[4]
The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance.

Ann Med Surg (Lond). 2024-5-6

[5]
Could ChatGPT Pass the UK Radiology Fellowship Examinations?

Acad Radiol. 2024-5

[6]
Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.

Cureus. 2024-3-13

[7]
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.

Surg Obes Relat Dis. 2024-7

[8]
Reshaping medical education: Performance of ChatGPT on a PES medical examination.

Cardiol J. 2024

[9]
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025-2

[10]
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.

Adv Med Educ Pract. 2024-9-20

引用本文的文献

[1]
Large language models in ophthalmology: a scoping review on their utility for clinicians, researchers, patients, and educators.

Eye (Lond). 2025-8-25

[2]
ChatGPT Assisting Diagnosis of Neuro-Ophthalmology Diseases Based on Case Reports.

J Neuroophthalmol. 2024-10-10

[3]
The Generation and Use of Medical MCQs: A Narrative Review.

Adv Med Educ Pract. 2025-8-5

[4]
Evaluating the Performance of ChatGPT on Board-Style Examination Questions in Ophthalmology: A Meta-Analysis.

J Med Syst. 2025-7-5

[5]
Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology.

Cureus. 2025-4-2

[6]
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025-4-30

[7]
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination.

Vision (Basel). 2025-4-9

[8]
Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

Cureus. 2025-2-24

[9]
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.

Med Sci Educ. 2024-11-13

[10]
Large Language Models in Ophthalmology: A Review of Publications from Top Ophthalmology Journals.

Ophthalmol Sci. 2024-12-17

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索