ChatGPT 关于角膜移植和 Fuchs 营养不良信息的质量和与科学共识的一致性。

Quality and Agreement With Scientific Consensus of ChatGPT Information Regarding Corneal Transplantation and Fuchs Dystrophy.

机构信息

Morgan State University, Baltimore, MD.

Harvard Medical School, Boston, MA.

出版信息

Cornea. 2024 Jun 1;43(6):746-750. doi: 10.1097/ICO.0000000000003439. Epub 2023 Nov 28.

DOI:10.1097/ICO.0000000000003439

PMID:38016014

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11076168/

Abstract

PURPOSE

ChatGPT is a commonly used source of information by patients and clinicians. However, it can be prone to error and requires validation. We sought to assess the quality and accuracy of information regarding corneal transplantation and Fuchs dystrophy from 2 iterations of ChatGPT, and whether its answers improve over time.

METHODS

A total of 10 corneal specialists collaborated to assess responses of the algorithm to 10 commonly asked questions related to endothelial keratoplasty and Fuchs dystrophy. These questions were asked from both ChatGPT-3.5 and its newer generation, GPT-4. Assessments tested quality, safety, accuracy, and bias of information. Chi-squared, Fisher exact tests, and regression analyses were conducted.

RESULTS

We analyzed 180 valid responses. On a 1 (A+) to 5 (F) scale, the average score given by all specialists across questions was 2.5 for ChatGPT-3.5 and 1.4 for GPT-4, a significant improvement ( P < 0.0001). Most responses by both ChatGPT-3.5 (61%) and GPT-4 (89%) used correct facts, a proportion that significantly improved across iterations ( P < 0.00001). Approximately a third (35%) of responses from ChatGPT-3.5 were considered against the scientific consensus, a notable rate of error that decreased to only 5% of answers from GPT-4 ( P < 0.00001).

CONCLUSIONS

The quality of responses in ChatGPT significantly improved between versions 3.5 and 4, and the odds of providing information against the scientific consensus decreased. However, the technology is still capable of producing inaccurate statements. Corneal specialists are uniquely positioned to assist users to discern the veracity and application of such information.

摘要

目的

ChatGPT 是患者和临床医生常用的信息来源。然而，它可能容易出错，需要验证。我们旨在评估来自 ChatGPT 两个版本的关于角膜移植和 Fuchs 营养不良的信息的质量和准确性，以及其答案是否随着时间的推移而改善。

方法

共有 10 名角膜专家合作评估算法对 10 个与内皮角膜移植和 Fuchs 营养不良相关的常见问题的回答。这些问题是分别向 ChatGPT-3.5 和其较新的一代 GPT-4 提出的。评估测试了信息的质量、安全性、准确性和偏差。进行了卡方检验、Fisher 精确检验和回归分析。

结果

我们分析了 180 个有效回复。在 1（A+）到 5（F）的评分中，所有专家对所有问题的平均评分分别为 ChatGPT-3.5 的 2.5 和 GPT-4 的 1.4，有显著提高（P < 0.0001）。ChatGPT-3.5（61%）和 GPT-4（89%）的大多数回复都使用了正确的事实，这一比例在迭代中显著提高（P < 0.00001）。约三分之一（35%）的 ChatGPT-3.5 回复与科学共识相悖，这是一个显著的错误率，而 GPT-4 的回复中只有 5%（P < 0.00001）与之相悖。

结论

ChatGPT 版本 3.5 和 4 之间的回复质量有显著提高，提供与科学共识相悖的信息的可能性降低。然而，该技术仍有可能产生不准确的陈述。角膜专家在帮助用户辨别此类信息的真实性和适用性方面具有独特的优势。

相似文献

Quality and Agreement With Scientific Consensus of ChatGPT Information Regarding Corneal Transplantation and Fuchs Dystrophy.ChatGPT 关于角膜移植和 Fuchs 营养不良信息的质量和与科学共识的一致性。

Cornea. 2024 Jun 1;43(6):746-750. doi: 10.1097/ICO.0000000000003439. Epub 2023 Nov 28.

Letter Regarding: Quality and Agreement With Scientific Consensus of ChatGPT Information Regarding Corneal Transplantation and Fuchs Dystrophy.关于：ChatGPT 有关角膜移植和富克斯角膜内皮营养不良信息的质量及与科学共识的一致性的信函

Cornea. 2024 Jun 1;43(6):e11. doi: 10.1097/ICO.0000000000003472. Epub 2024 Jan 25.

Ophthalmology. 2014 Nov;121(11):2147-52. doi: 10.1016/j.ophtha.2014.04.046. Epub 2014 Jul 9.

Anterior corneal aberrations after Descemet's stripping endothelial keratoplasty for Fuchs' endothelial dystrophy.Descemet 氏膜内皮角膜移植术后 Fuchs 角膜内皮营养不良的前角膜像差。

Ophthalmology. 2012 Aug;119(8):1522-9. doi: 10.1016/j.ophtha.2012.01.038. Epub 2012 Apr 4.

[Outcomes of hemi-Descemet membrane endothelial keratoplasty and phacoemulsification for the treatment of primary Fuchs' endothelial corneal dystrophy combined with cataract].[半厚Descemet膜内皮角膜移植术联合白内障超声乳化术治疗原发性Fuchs内皮角膜营养不良合并白内障的疗效]

Vestn Oftalmol. 2024;140(1):36-44. doi: 10.17116/oftalma202414001136.

Intraocular pressure and corneal biomechanics in Fuchs' endothelial dystrophy and after posterior lamellar keratoplasty.Fuchs 内皮营养不良和后部板层角膜移植术后的眼压和角膜生物力学。

Acta Ophthalmol. 2014 Jun;92(4):350-4. doi: 10.1111/aos.12137. Epub 2013 Apr 23.

Combined granular and Fuchs' corneal dystrophy diagnosed by confocal microscopy after total anterior lamellar keratoplasty.全层前板层角膜移植术后通过共聚焦显微镜诊断的颗粒状和富克斯角膜营养不良合并症

Ann Ophthalmol (Skokie). 2009 Fall-Winter;41(3-4):179-83.

Fuchs' endothelial corneal dystrophy: subjective grading versus objective grading based on the central-to-peripheral thickness ratio.Fuchs 内皮角膜营养不良：基于中央至周边厚度比的主观分级与客观分级。

Ophthalmology. 2013 Apr;120(4):687-94. doi: 10.1016/j.ophtha.2012.09.022. Epub 2013 Jan 28.

Coincident keratoconus and Fuchs' endothelial dystrophy: Dual dystrophies mask progression.并发性圆锥角膜和 Fuchs 角膜内皮营养不良：双重营养不良掩盖进展。

Indian J Ophthalmol. 2020 Dec;68(12):3074-3076. doi: 10.4103/ijo.IJO_1783_20.

Endothelial keratoplasty versus penetrating keratoplasty for Fuchs endothelial dystrophy.针对富克斯内皮营养不良，内皮角膜移植术与穿透性角膜移植术的比较。

Cochrane Database Syst Rev. 2014 Feb 14;2014(2):CD008420. doi: 10.1002/14651858.CD008420.pub3.

引用本文的文献

Large language models in ophthalmology: a scoping review on their utility for clinicians, researchers, patients, and educators.眼科领域的大语言模型：关于其对临床医生、研究人员、患者和教育工作者的效用的范围综述

Eye (Lond). 2025 Aug 25. doi: 10.1038/s41433-025-03935-7.

Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.评估大型语言模型在与上睑下垂相关问题中的表现：一项跨语言研究。

Transl Vis Sci Technol. 2025 Jul 1;14(7):9. doi: 10.1167/tvst.14.7.9.

Current applications and challenges in large language models for patient care: a systematic review.用于患者护理的大语言模型的当前应用与挑战：一项系统综述

Commun Med (Lond). 2025 Jan 21;5(1):26. doi: 10.1038/s43856-024-00717-2.

Artificial Intelligence, Medical Knowledge, and Empowering Patients.人工智能、医学知识与赋能患者

Mayo Clin Proc Digit Health. 2024 Mar;2(1):160-162. doi: 10.1016/j.mcpdig.2024.01.008. Epub 2024 Feb 28.

Microsoft Bing outperforms five other generative artificial intelligence chatbots in the Antwerp University multiple choice medical license exam.在安特卫普大学的多项选择医学执照考试中，微软必应的表现优于其他五个生成式人工智能聊天机器人。

PLOS Digit Health. 2024 Feb 14;3(2):e0000349. doi: 10.1371/journal.pdig.0000349. eCollection 2024 Feb.

本文引用的文献

Performance of ChatGPT in Diagnosis of Corneal Eye Diseases.ChatGPT 在角膜眼病诊断中的表现。

Cornea. 2024 May 1;43(5):664-670. doi: 10.1097/ICO.0000000000003492. Epub 2024 Feb 23.

Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions.眼科中的人工智能：GPT-3.5、GPT-4与人类专家回答StatPearls问题的比较分析

Cureus. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822. eCollection 2023 Jun.

Large language models encode clinical knowledge.大语言模型编码临床知识。

Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes.ChatGPT 与眼科学：从出院小结和手术记录探索其潜力。

Semin Ophthalmol. 2023 Jul;38(5):503-507. doi: 10.1080/08820538.2023.2209166. Epub 2023 May 3.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

AI-Generated Medical Advice-GPT and Beyond.人工智能生成的医学建议——GPT及其他。

JAMA. 2023 Apr 25;329(16):1349-1350. doi: 10.1001/jama.2023.5321.

Will ChatGPT transform healthcare?ChatGPT会改变医疗保健行业吗？

Nat Med. 2023 Mar;29(3):505-506. doi: 10.1038/s41591-023-02289-5.

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios.评估 ChatGPT 在医疗保健中的可行性：对多个临床和研究场景的分析。

J Med Syst. 2023 Mar 4;47(1):33. doi: 10.1007/s10916-023-01925-4.

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性：一项初步研究。

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验