Suppr
超能文献

眼科领域中的ChatGPT-3.5和必应聊天：性能、可读性及信息来源的最新评估

ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources.

作者信息

Tao Brendan Ka-Lok, Hua Nicholas, Milkovich John, Micieli Jonathan Andrew

机构信息

Faculty of Medicine, The University of British Columbia, 317-2194 Health Sciences Mall, Vancouver, BC, V6T 1Z3, Canada.

Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.

出版信息

Eye (Lond). 2024 Jul;38(10):1897-1902. doi: 10.1038/s41433-024-03037-w. Epub 2024 Mar 20.

DOI:10.1038/s41433-024-03037-w

PMID:38509182

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11226422/

Abstract

BACKGROUND/OBJECTIVES: Experimental investigation. Bing Chat (Microsoft) integration with ChatGPT-4 (OpenAI) integration has conferred the capability of accessing online data past 2021. We investigate its performance against ChatGPT-3.5 on a multiple-choice question ophthalmology exam.

SUBJECTS/METHODS: In August 2023, ChatGPT-3.5 and Bing Chat were evaluated against 913 questions derived from the Academy's Basic and Clinical Science Collection collection. For each response, the sub-topic, performance, Simple Measure of Gobbledygook readability score (measuring years of required education to understand a given passage), and cited resources were collected. The primary outcomes were the comparative scores between models, and qualitatively, the resources referenced by Bing Chat. Secondary outcomes included performance stratified by response readability, question type (explicit or situational), and BCSC sub-topic.

RESULTS

Across 913 questions, ChatGPT-3.5 scored 59.69% [95% CI 56.45,62.94] while Bing Chat scored 73.60% [95% CI 70.69,76.52]. Both models performed significantly better in explicit than clinical reasoning questions. Both models performed best on general medicine questions than ophthalmology subsections. Bing Chat referenced 927 online entities and provided at-least one citation to 836 of the 913 questions. The use of more reliable (peer-reviewed) sources was associated with higher likelihood of correct response. The most-cited resources were eyewiki.aao.org, aao.org, wikipedia.org, and ncbi.nlm.nih.gov. Bing Chat showed significantly better readability than ChatGPT-3.5, averaging a reading level of grade 11.4 [95% CI 7.14, 15.7] versus 12.4 [95% CI 8.77, 16.1], respectively (p-value < 0.0001, ρ = 0.25).

CONCLUSIONS

The online access, improved readability, and citation feature of Bing Chat confers additional utility for ophthalmology learners. We recommend critical appraisal of cited sources during response interpretation.

摘要

背景/目的：实验研究。必应聊天（微软）与ChatGPT-4（OpenAI）的集成使其能够获取2021年以后的在线数据。我们在一项眼科选择题考试中研究了它与ChatGPT-3.5相比的表现。

受试者/方法：2023年8月，针对从眼科学会基础与临床科学合集衍生出的913道问题对ChatGPT-3.5和必应聊天进行了评估。对于每个回答，收集了子主题、表现、复杂程度简易评分（衡量理解一段给定文字所需的教育年限）以及引用的资源。主要结果是模型之间的比较分数，以及定性地看必应聊天引用的资源。次要结果包括按回答可读性、问题类型（明确或情境性）以及基础与临床科学合集子主题分层的表现。

结果

在913道问题中，ChatGPT-3.5的得分率为59.69% [95%置信区间56.45, 62.94]，而必应聊天的得分率为73.60% [95%置信区间70.69, 76.52]。在明确性问题上，两个模型的表现均显著优于临床推理问题。在普通医学问题上，两个模型的表现均优于眼科部分。必应聊天引用了927个在线实体，并为913道问题中的836道提供了至少一条引用。使用更可靠（经过同行评审）的来源与正确回答的可能性更高相关。被引用最多的资源是eyewiki.aao.org、aao.org、wikipedia.org和ncbi.nlm.nih.gov。必应聊天的可读性显著优于ChatGPT-3.5，平均阅读水平分别为11.4年级[95%置信区间7.14, 15.7]和12.4年级[95%置信区间8.77, 16.1]（p值<0.0001，ρ = 0.25）。

结论

必应聊天的在线访问、提高的可读性和引用功能为眼科学习者带来了额外的实用价值。我们建议在解读回答时对引用的来源进行批判性评估。

相似文献

ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources.

Eye (Lond). 2024 Jul;38(10):1897-1902. doi: 10.1038/s41433-024-03037-w. Epub 2024 Mar 20.

The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.

Eur J Ophthalmol. 2025 Jul;35(4):1323-1328. doi: 10.1177/11206721251321197. Epub 2025 Feb 19.

Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.

J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

American Academy of Orthopaedic Surgeons OrthoInfo provides more readable information regarding rotator cuff injury than ChatGPT.

J ISAKOS. 2025 Feb 12;12:100841. doi: 10.1016/j.jisako.2025.100841.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Assessing chatbots ability to produce leaflets on cataract surgery: Bing AI, chatGPT 3.5, chatGPT 4o, ChatSonic, Google Bard, Perplexity, and Pi.

J Cataract Refract Surg. 2025 May 1;51(5):371-375. doi: 10.1097/j.jcrs.0000000000001622.

Sertindole for schizophrenia.

Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.

Artificial Intelligence Shows Limited Success in Improving Readability Levels of Spanish-language Orthopaedic Patient Education Materials.

Clin Orthop Relat Res. 2025 Feb 11. doi: 10.1097/CORR.0000000000003413.

Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.

J Pediatr Ophthalmol Strabismus. 2025 May-Jun;62(3):220-227. doi: 10.3928/01913913-20250110-02. Epub 2025 Feb 19.

引用本文的文献

Enhancing the Readability of Online Pediatric Cataract Education Materials: A Comparative Study of Large Language Models.

Transl Vis Sci Technol. 2025 Aug 1;14(8):19. doi: 10.1167/tvst.14.8.19.

Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making.

J Ophthalmic Vis Res. 2025 May 5;20. doi: 10.18502/jovr.v20.14692. eCollection 2025.

Evaluating the Performance of ChatGPT on Board-Style Examination Questions in Ophthalmology: A Meta-Analysis.

J Med Syst. 2025 Jul 5;49(1):94. doi: 10.1007/s10916-025-02227-7.

Comparative analysis of language models in addressing syphilis-related queries.

Med Oral Patol Oral Cir Bucal. 2025 Jul 1;30(4):e551-e560. doi: 10.4317/medoral.27092.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Large Language Models in Ophthalmology: A Review of Publications from Top Ophthalmology Journals.

Ophthalmol Sci. 2024 Dec 17;5(3):100681. doi: 10.1016/j.xops.2024.100681. eCollection 2025 May-Jun.

A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.

Rom J Ophthalmol. 2024 Oct-Dec;68(4):367-371. doi: 10.22336/rjo.2024.67.

Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.

J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.

Evaluating the Performance of ChatGPT 3.5 and 4.0 on StatPearls Oculoplastic Surgery Text- and Image-Based Exam Questions.

Cureus. 2024 Nov 16;16(11):e73812. doi: 10.7759/cureus.73812. eCollection 2024 Nov.

Applications of ChatGPT in the diagnosis, management, education, and research of retinal diseases: a scoping review.

Int J Retina Vitreous. 2024 Oct 17;10(1):79. doi: 10.1186/s40942-024-00595-9.

本文引用的文献

Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology.

Sci Rep. 2023 Oct 29;13(1):18562. doi: 10.1038/s41598-023-45837-2.

Comment on: Performance of Generative Large Language Models on Ophthalmology Board Style Questions.

Am J Ophthalmol. 2023 Dec;256:200. doi: 10.1016/j.ajo.2023.07.029. Epub 2023 Aug 2.

An AI-based intervention for improving undergraduate STEM learning.

PLoS One. 2023 Jul 19;18(7):e0288844. doi: 10.1371/journal.pone.0288844. eCollection 2023.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Performance of an Upgraded Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment.

JAMA Ophthalmol. 2023 Aug 1;141(8):798-800. doi: 10.1001/jamaophthalmol.2023.2754.

ChatGPT in ophthalmology: the dawn of a new era?

Eye (Lond). 2024 Jan;38(1):4-7. doi: 10.1038/s41433-023-02619-4. Epub 2023 Jun 27.

Performance of Generative Large Language Models on Ophthalmology Board-Style Questions.

Am J Ophthalmol. 2023 Oct;254:141-149. doi: 10.1016/j.ajo.2023.05.024. Epub 2023 Jun 18.

ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations.

Front Artif Intell. 2023 May 4;6:1169595. doi: 10.3389/frai.2023.1169595. eCollection 2023.

ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes.

Semin Ophthalmol. 2023 Jul;38(5):503-507. doi: 10.1080/08820538.2023.2209166. Epub 2023 May 3.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

眼科领域中的ChatGPT-3.5和必应聊天：性能、可读性及信息来源的最新评估

ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources.

作者信息

机构信息

出版信息

RESULTS

CONCLUSIONS

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译