Suppr超能文献

ChatGPT对眼科病例的回答分析:ChatGPT能像眼科医生一样思考吗?

Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?

作者信息

Chen Jimmy S, Reddy Akshay J, Al-Sharif Eman, Shoji Marissa K, Kalaw Fritz Gerald P, Eslani Medi, Lang Paul Z, Arya Malvika, Koretz Zachary A, Bolo Kyle A, Arnett Justin J, Roginiel Aliya C, Do Jiun L, Robbins Shira L, Camp Andrew S, Scott Nathan L, Rudell Jolene C, Weinreb Robert N, Baxter Sally L, Granet David B

机构信息

Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, La Jolla, California.

UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California.

出版信息

Ophthalmol Sci. 2024 Aug 23;5(1):100600. doi: 10.1016/j.xops.2024.100600. eCollection 2025 Jan-Feb.

Abstract

OBJECTIVE

Large language models such as ChatGPT have demonstrated significant potential in question-answering within ophthalmology, but there is a paucity of literature evaluating its ability to generate clinical assessments and discussions. The objectives of this study were to (1) assess the accuracy of assessment and plans generated by ChatGPT and (2) evaluate ophthalmologists' abilities to distinguish between responses generated by clinicians versus ChatGPT.

DESIGN

Cross-sectional mixed-methods study.

SUBJECTS

Sixteen ophthalmologists from a single academic center, of which 10 were board-eligible and 6 were board-certified, were recruited to participate in this study.

METHODS

Prompt engineering was used to ensure ChatGPT output discussions in the style of the ophthalmologist author of the Medical College of Wisconsin Ophthalmic Case Studies. Cases where ChatGPT accurately identified the primary diagnoses were included and then paired. Masked human-generated and ChatGPT-generated discussions were sent to participating ophthalmologists to identify the author of the discussions. Response confidence was assessed using a 5-point Likert scale score, and subjective feedback was manually reviewed.

MAIN OUTCOME MEASURES

Accuracy of ophthalmologist identification of discussion author, as well as subjective perceptions of human-generated versus ChatGPT-generated discussions.

RESULTS

Overall, ChatGPT correctly identified the primary diagnosis in 15 of 17 (88.2%) cases. Two cases were excluded from the paired comparison due to hallucinations or fabrications of nonuser-provided data. Ophthalmologists correctly identified the author in 77.9% ± 26.6% of the 13 included cases, with a mean Likert scale confidence rating of 3.6 ± 1.0. No significant differences in performance or confidence were found between board-certified and board-eligible ophthalmologists. Subjectively, ophthalmologists found that discussions written by ChatGPT tended to have more generic responses, irrelevant information, hallucinated more frequently, and had distinct syntactic patterns (all < 0.01).

CONCLUSIONS

Large language models have the potential to synthesize clinical data and generate ophthalmic discussions. While these findings have exciting implications for artificial intelligence-assisted health care delivery, more rigorous real-world evaluation of these models is necessary before clinical deployment.

FINANCIAL DISCLOSURES

The author(s) have no proprietary or commercial interest in any materials discussed in this article.

摘要

目的

ChatGPT等大型语言模型在眼科问答方面已展现出巨大潜力,但评估其生成临床评估和讨论能力的文献较少。本研究的目的是:(1)评估ChatGPT生成的评估和计划的准确性;(2)评估眼科医生区分临床医生与ChatGPT生成的回复的能力。

设计

横断面混合方法研究。

研究对象

招募了来自单一学术中心的16名眼科医生,其中10名具备委员会资格,6名获得委员会认证,参与本研究。

方法

采用提示工程确保ChatGPT以威斯康星医学院眼科病例研究的眼科医生作者的风格输出讨论内容。纳入ChatGPT准确识别出主要诊断的病例,然后进行配对。将经过匿名处理的人工生成和ChatGPT生成的讨论内容发送给参与研究的眼科医生,以确定讨论的作者。使用5点李克特量表评分评估回答的信心,并人工审查主观反馈。

主要观察指标

眼科医生识别讨论作者的准确性,以及对人工生成与ChatGPT生成的讨论的主观认知。

结果

总体而言,ChatGPT在17例病例中的15例(88.2%)中正确识别出主要诊断。由于出现非用户提供数据的幻觉或编造情况,有2例被排除在配对比较之外。在13例纳入的病例中,眼科医生在77.9%±26.6%的病例中正确识别出作者,李克特量表平均信心评分为3.6±1.0。委员会认证和具备委员会资格的眼科医生在表现或信心方面未发现显著差异。主观上,眼科医生发现ChatGPT撰写的讨论往往有更通用的回复、无关信息,幻觉出现得更频繁,并且有独特的句法模式(所有P<0.01)。

结论

大型语言模型有潜力综合临床数据并生成眼科讨论内容。虽然这些发现对人工智能辅助医疗服务具有令人兴奋的意义,但在临床应用之前,需要对这些模型进行更严格的实际评估。

财务披露

作者对本文讨论的任何材料均无所有权或商业利益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5daa/11437840/a730405c59e1/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验