Suppr超能文献

基于在线聊天的人工智能模型给出的眼科建议的适宜性

Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model.

作者信息

Tailor Prashant D, Xu Timothy T, Fortes Blake H, Iezzi Raymond, Olsen Timothy W, Starr Matthew R, Bakri Sophie J, Scruggs Brittni A, Barkmeier Andrew J, Patel Sanjay V, Baratz Keith H, Bernhisel Ashlie A, Wagner Lilly H, Tooley Andrea A, Roddy Gavin W, Sit Arthur J, Wu Kristi Y, Bothun Erick D, Mansukhani Sasha A, Mohney Brian G, Chen John J, Brodsky Michael C, Tajfirouz Deena A, Chodnicki Kevin D, Smith Wendy M, Dalvin Lauren A

机构信息

Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.).

出版信息

Mayo Clin Proc Digit Health. 2024 Mar;2(1):119-128. doi: 10.1016/j.mcpdig.2024.01.003. Epub 2024 Feb 15.

Abstract

OBJECTIVE

To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.

PATIENTS AND METHODS

Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.

RESULTS

For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.

CONCLUSION

This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

摘要

目的

确定基于在线聊天的人工智能模型针对眼科问题给出的眼科建议的恰当性。

患者与方法

2023年4月1日至2023年4月30日的横断面定性研究。共生成了涵盖所有眼科亚专业的192个问题。每个问题向一个大语言模型(LLM)提问3次。由相应的亚专科专家在两种分级情境下将回答评定为恰当、不恰当或不可靠。第一种分级情境是信息是否呈现在患者信息网站上。第二种是LLM针对电子病历(EMR)发送的患者询问生成的初步回答。恰当定义为准确且具体,足以替代经医生认可的信息。主要结局指标是每个亚专业恰当回答的百分比。

结果

对于与患者信息网站相关的问题,LLM给出的总体平均恰当回答率为79%。在眼科各亚专业中,患者信息网站信息的平均恰当率各不相同,范围从56%到100%:白内障或屈光(92%)、角膜(56%)、青光眼(72%)、神经眼科(67%)、眼整形或眼眶手术(80%)、眼肿瘤学(100%)、儿科(89%)、玻璃体视网膜疾病(86%)和葡萄膜炎(65%)。对于通过EMR对患者问题的初步回答,LLM给出的总体平均恰当回答率为74%,且因亚专业而异:白内障或屈光(85%)、角膜(54%)、青光眼(77%)、神经眼科(63%)、眼整形或眼眶手术(62%)、眼肿瘤学(90%)、儿科(94%)、玻璃体视网膜疾病(88%)和葡萄膜炎(55%)。按健康信息类别(疾病与状况、风险与预防、手术相关以及治疗与管理)分层分级显示出显著但不显著的差异,在两种情境下,疾病与状况的恰当性评级通常最高(72%和69%),手术相关的最低(55%和51%)。

结论

在患者信息网站以及与EMR相关的患者问题回答的背景下,该LLM在多个眼科亚专业中给出了大多恰当的回答。在广泛临床应用之前,当前的LLM产品需要优化和改进。

相似文献

本文引用的文献

7
The exciting potential for ChatGPT in obstetrics and gynecology.ChatGPT 在妇产科领域的令人兴奋的潜力。
Am J Obstet Gynecol. 2023 Jun;228(6):696-705. doi: 10.1016/j.ajog.2023.03.009. Epub 2023 Mar 15.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验