Suppr超能文献

对ChatGPT4和PaLM2关于年龄相关性黄斑变性患者问题回答的定性评估

A Qualitative Evaluation of ChatGPT4 and PaLM2's Response to Patient's Questions Regarding Age-Related Macular Degeneration.

作者信息

Muntean George Adrian, Marginean Anca, Groza Adrian, Damian Ioana, Roman Sara Alexia, Hapca Mădălina Claudia, Sere Anca Mădălina, Mănoiu Roxana Mihaela, Muntean Maximilian Vlad, Nicoară Simona Delia

机构信息

Department of Ophthalmology, "Iuliu Hatieganu" University of Medicine and Pharmacy, Emergency County Hospital, 400347 Cluj-Napoca, Romania.

Department of Computer Science, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania.

出版信息

Diagnostics (Basel). 2024 Jul 9;14(14):1468. doi: 10.3390/diagnostics14141468.

Abstract

Patient compliance in chronic illnesses is essential for disease management. This also applies to age-related macular degeneration (AMD), a chronic acquired retinal degeneration that needs constant monitoring and patient cooperation. Therefore, patients with AMD can benefit by being properly informed about their disease, regardless of the condition's stage. Information is essential in keeping them compliant with lifestyle changes, regular monitoring, and treatment. Large language models have shown potential in numerous fields, including medicine, with remarkable use cases. In this paper, we wanted to assess the capacity of two large language models (LLMs), ChatGPT4 and PaLM2, to offer advice to questions frequently asked by patients with AMD. After searching on AMD-patient-dedicated websites for frequently asked questions, we curated and selected a number of 143 questions. The questions were then transformed into scenarios that were answered by ChatGPT4, PaLM2, and three ophthalmologists. Afterwards, the answers provided by the two LLMs to a set of 133 questions were evaluated by two ophthalmologists, who graded each answer on a five-point Likert scale. The models were evaluated based on six qualitative criteria: (C1) reflects clinical and scientific consensus, (C2) likelihood of possible harm, (C3) evidence of correct reasoning, (C4) evidence of correct comprehension, (C5) evidence of correct retrieval, and (C6) missing content. Out of 133 questions, ChatGPT4 received a score of five from both reviewers to 118 questions (88.72%) for C1, to 130 (97.74%) for C2, to 131 (98.50%) for C3, to 133 (100%) for C4, to 132 (99.25%) for C5, and to 122 (91.73%) for C6, while PaLM2 to 81 questions (60.90%) for C1, to 114 (85.71%) for C2, to 115 (86.47%) for C3, to 124 (93.23%) for C4, to 113 (84.97%) for C5, and to 93 (69.92%) for C6. Despite the overall high performance, there were answers that are incomplete or inaccurate, and the paper explores the type of errors produced by these LLMs. Our study reveals that ChatGPT4 and PaLM2 are valuable instruments for patient information and education; however, since there are still some limitations to these models, for proper information, they should be used in addition to the advice provided by the physicians.

摘要

慢性病患者的依从性对于疾病管理至关重要。这同样适用于年龄相关性黄斑变性(AMD),这是一种慢性获得性视网膜变性疾病,需要持续监测和患者配合。因此,无论AMD处于何种阶段,让患者充分了解自己的病情都能使其受益。信息对于促使他们遵循生活方式改变、定期监测和治疗至关重要。大语言模型在包括医学在内的众多领域都展现出了潜力,并有显著的应用案例。在本文中,我们希望评估两种大语言模型ChatGPT4和PaLM2为AMD患者常见问题提供建议的能力。在专门针对AMD患者的网站上搜索常见问题后,我们整理并挑选了143个问题。然后将这些问题转化为场景,由ChatGPT4、PaLM2和三位眼科医生进行回答。之后,两位眼科医生对这两种大语言模型针对一组133个问题给出的答案进行评估,他们按照五点李克特量表对每个答案进行评分。基于六个定性标准对模型进行评估:(C1)反映临床和科学共识,(C2)可能造成伤害的可能性,(C3)正确推理的证据,(C4)正确理解的证据,(C5)正确检索的证据,以及(C6)缺失内容。在133个问题中,ChatGPT4在C1方面,两位评审者都给118个问题(88.72%)打了5分;在C2方面,给130个问题(97.74%)打了5分;在C3方面,给131个问题(98.50%)打了5分;在C4方面,给133个问题(100%)打了5分;在C5方面,给132个问题(99.25%)打了5分;在C6方面,给122个问题(91.73%)打了5分。而PaLM2在C1方面,给81个问题(60.90%)打了5分;在C2方面,给114个问题(85.71%)打了5分;在C3方面,给115个问题(86.47%)打了5分;在C4方面,给124个问题(93.23%)打了5分;在C5方面,给113个问题(84.97%)打了5分;在C6方面,给93个问题(69.92%)打了5分。尽管整体表现出色,但仍有一些答案不完整或不准确,本文探讨了这些大语言模型产生的错误类型。我们的研究表明,ChatGPT4和PaLM2是患者信息和教育的有价值工具;然而,由于这些模型仍存在一些局限性,为了提供准确信息,除了医生提供的建议外,还应使用这些模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/6563e0eda60f/diagnostics-14-01468-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验