• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-4和谷歌Gemini在提供视网膜脱离信息方面的准确性和可读性评估:一项多中心专家对比研究。

Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.

作者信息

Strzalkowski Piotr, Strzalkowska Alicja, Chhablani Jay, Pfau Kristina, Errera Marie-Hélène, Roth Mathias, Schaub Friederike, Bechrakis Nikolaos E, Hoerauf Hans, Reiter Constantin, Schuster Alexander K, Geerling Gerd, Guthoff Rainer

机构信息

Department of Ophthalmology, Medical Faculty and University Hospital Düsseldorf - Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

UPMC Eye Center, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Int J Retina Vitreous. 2024 Sep 2;10(1):61. doi: 10.1186/s40942-024-00579-9.

DOI:10.1186/s40942-024-00579-9
PMID:39223678
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11367851/
Abstract

BACKGROUND

Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment.

METHODS

Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts.

RESULTS

Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased.

CONCLUSIONS

ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.

摘要

背景

ChatGPT-4和谷歌Gemini等大语言模型在患者健康教育方面显示出潜力,但对其准确性的担忧需要仔细评估。本研究评估了ChatGPT-4和谷歌Gemini在回答视网膜脱离相关问题时的可读性和准确性。

方法

比较研究,分析ChatGPT-4和谷歌Gemini对13个视网膜脱离问题的回答,这些问题按难度级别(D1、D2、D3)分类。由十位玻璃体视网膜专家对匿名回答进行评审,并根据正确性、错误、主题准确性、连贯性和整体质量分级进行评分。分析包括弗莱什易读性分数、单词和句子数量。

结果

两种人工智能工具在所有难度级别上都需要大学水平的理解能力。谷歌Gemini更容易理解(p = 0.03),而ChatGPT-4在回答较难问题时提供了更多正确答案(p = 0.0005),严重错误更少。ChatGPT-4在最具挑战性的问题上得分最高,显示出卓越的主题准确性(p = 0.003)。ChatGPT-4在13个问题中的8个上表现优于谷歌Gemini,在最容易(p = 0.03)和最难级别(p = 0.0002)的整体质量评分更高,随着问题难度增加评分降低。

结论

ChatGPT-4和谷歌Gemini有效地回答了关于视网膜脱离的问题,提供了大多准确的答案,关键错误很少,不过患者需要接受高等教育才能理解。人工智能工具的应用可能有助于通过快速提供准确和相关的医疗保健信息来改善医疗服务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/197d/11367851/82258a625692/40942_2024_579_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/197d/11367851/ad96071b6bab/40942_2024_579_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/197d/11367851/82258a625692/40942_2024_579_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/197d/11367851/ad96071b6bab/40942_2024_579_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/197d/11367851/82258a625692/40942_2024_579_Fig2_HTML.jpg

相似文献

1
Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.ChatGPT-4和谷歌Gemini在提供视网膜脱离信息方面的准确性和可读性评估:一项多中心专家对比研究。
Int J Retina Vitreous. 2024 Sep 2;10(1):61. doi: 10.1186/s40942-024-00579-9.
2
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
3
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
4
Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.ChatGPT 3.5、ChatGPT 4.0和谷歌Gemini等人工智能模型在神经放射学诊断中的比较评估
Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.
5
Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases.探索 AI 聊天机器人在眼科手术规划方面的建议能力:ChatGPT 与 Google Gemini 对视网膜脱离病例的分析比较。
Br J Ophthalmol. 2024 Sep 20;108(10):1457-1469. doi: 10.1136/bjo-2023-325143.
6
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
7
End-of-life Care Patient Information Leaflets-A Comparative Evaluation of Artificial Intelligence-generated Content for Readability, Sentiment, Accuracy, Completeness, and Suitability: ChatGPT vs Google Gemini.临终关怀患者信息手册——人工智能生成内容在可读性、情感倾向、准确性、完整性和适用性方面的比较评估:ChatGPT与谷歌Gemini对比
Indian J Crit Care Med. 2024 Jun;28(6):561-568. doi: 10.5005/jp-journals-10071-24725.
8
Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.大型语言模型的性能比较分析:ChatGPT-3.5、ChatGPT-4 和 Google Gemini 在糖皮质激素诱导性骨质疏松症中的表现。
J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2.
9
Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini.解锁健康素养:ChatGPT与谷歌Gemini高血压教育终极指南
Cureus. 2024 May 8;16(5):e59898. doi: 10.7759/cureus.59898. eCollection 2024 May.
10
Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Gemini Advanced与ChatGPT 4.0在眼科住院医师眼科知识评估计划(OKAP)考试复习题库中的表现比较。
Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

引用本文的文献

1
Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息:比较研究
JMIR Form Res. 2025 Aug 13;9:e73642. doi: 10.2196/73642.
2
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
3
Enhancing ophthalmology students' awareness of retinitis pigmentosa: assessing the efficacy of ChatGPT in AI-assisted teaching of rare diseases-a quasi-experimental study.

本文引用的文献

1
Performance of an Upgraded Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment.升级后的人工智能聊天机器人在眼科知识评估方面的表现。
JAMA Ophthalmol. 2023 Aug 1;141(8):798-800. doi: 10.1001/jamaophthalmol.2023.2754.
2
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
3
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.
提高眼科学生对色素性视网膜炎的认识:评估ChatGPT在罕见病人工智能辅助教学中的效果——一项准实验研究
Front Med (Lausanne). 2025 Mar 18;12:1534294. doi: 10.3389/fmed.2025.1534294. eCollection 2025.
4
Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。
J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.
5
A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.人工智能平台的比较分析:ChatGPT-4o与谷歌Gemini在回答避孕方法相关问题方面的表现
Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.
6
Automated and code-free development of a risk calculator using ChatGPT-4 for predicting diabetic retinopathy and macular edema without retinal imaging.使用ChatGPT-4在无视网膜成像的情况下自动且无需编码开发用于预测糖尿病性视网膜病变和黄斑水肿的风险计算器。
Int J Retina Vitreous. 2025 Jan 31;11(1):11. doi: 10.1186/s40942-025-00638-9.
GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
4
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
5
ChatGPT: the future of discharge summaries?ChatGPT:出院小结的未来?
Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.
6
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
7
ChatGPT: five priorities for research.ChatGPT:研究的五个优先事项。
Nature. 2023 Feb;614(7947):224-226. doi: 10.1038/d41586-023-00288-7.
8
A Comparison of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis.用于分诊和诊断目的的人工智能与人类医生的比较
Front Artif Intell. 2020 Nov 30;3:543405. doi: 10.3389/frai.2020.543405. eCollection 2020.
9
Sociodemographic Factors Influencing Rhegmatogenous Retinal Detachment Presentation and Outcome.影响孔源性视网膜脱离表现和结局的社会人口学因素。
Ophthalmol Retina. 2021 Apr;5(4):337-341. doi: 10.1016/j.oret.2020.08.001. Epub 2020 Aug 6.
10
Factors affecting visual recovery after successful repair of macula-off retinal detachments: findings from a large prospective UK cohort study.影响黄斑裂孔视网膜脱离修复后视力恢复的因素:来自大型前瞻性英国队列研究的结果。
Eye (Lond). 2021 May;35(5):1431-1439. doi: 10.1038/s41433-020-1021-y. Epub 2020 Jun 24.