• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对ChatGPT4和PaLM2关于年龄相关性黄斑变性患者问题回答的定性评估

A Qualitative Evaluation of ChatGPT4 and PaLM2's Response to Patient's Questions Regarding Age-Related Macular Degeneration.

作者信息

Muntean George Adrian, Marginean Anca, Groza Adrian, Damian Ioana, Roman Sara Alexia, Hapca Mădălina Claudia, Sere Anca Mădălina, Mănoiu Roxana Mihaela, Muntean Maximilian Vlad, Nicoară Simona Delia

机构信息

Department of Ophthalmology, "Iuliu Hatieganu" University of Medicine and Pharmacy, Emergency County Hospital, 400347 Cluj-Napoca, Romania.

Department of Computer Science, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania.

出版信息

Diagnostics (Basel). 2024 Jul 9;14(14):1468. doi: 10.3390/diagnostics14141468.

DOI:10.3390/diagnostics14141468
PMID:39061606
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11275354/
Abstract

Patient compliance in chronic illnesses is essential for disease management. This also applies to age-related macular degeneration (AMD), a chronic acquired retinal degeneration that needs constant monitoring and patient cooperation. Therefore, patients with AMD can benefit by being properly informed about their disease, regardless of the condition's stage. Information is essential in keeping them compliant with lifestyle changes, regular monitoring, and treatment. Large language models have shown potential in numerous fields, including medicine, with remarkable use cases. In this paper, we wanted to assess the capacity of two large language models (LLMs), ChatGPT4 and PaLM2, to offer advice to questions frequently asked by patients with AMD. After searching on AMD-patient-dedicated websites for frequently asked questions, we curated and selected a number of 143 questions. The questions were then transformed into scenarios that were answered by ChatGPT4, PaLM2, and three ophthalmologists. Afterwards, the answers provided by the two LLMs to a set of 133 questions were evaluated by two ophthalmologists, who graded each answer on a five-point Likert scale. The models were evaluated based on six qualitative criteria: (C1) reflects clinical and scientific consensus, (C2) likelihood of possible harm, (C3) evidence of correct reasoning, (C4) evidence of correct comprehension, (C5) evidence of correct retrieval, and (C6) missing content. Out of 133 questions, ChatGPT4 received a score of five from both reviewers to 118 questions (88.72%) for C1, to 130 (97.74%) for C2, to 131 (98.50%) for C3, to 133 (100%) for C4, to 132 (99.25%) for C5, and to 122 (91.73%) for C6, while PaLM2 to 81 questions (60.90%) for C1, to 114 (85.71%) for C2, to 115 (86.47%) for C3, to 124 (93.23%) for C4, to 113 (84.97%) for C5, and to 93 (69.92%) for C6. Despite the overall high performance, there were answers that are incomplete or inaccurate, and the paper explores the type of errors produced by these LLMs. Our study reveals that ChatGPT4 and PaLM2 are valuable instruments for patient information and education; however, since there are still some limitations to these models, for proper information, they should be used in addition to the advice provided by the physicians.

摘要

慢性病患者的依从性对于疾病管理至关重要。这同样适用于年龄相关性黄斑变性(AMD),这是一种慢性获得性视网膜变性疾病,需要持续监测和患者配合。因此,无论AMD处于何种阶段,让患者充分了解自己的病情都能使其受益。信息对于促使他们遵循生活方式改变、定期监测和治疗至关重要。大语言模型在包括医学在内的众多领域都展现出了潜力,并有显著的应用案例。在本文中,我们希望评估两种大语言模型ChatGPT4和PaLM2为AMD患者常见问题提供建议的能力。在专门针对AMD患者的网站上搜索常见问题后,我们整理并挑选了143个问题。然后将这些问题转化为场景,由ChatGPT4、PaLM2和三位眼科医生进行回答。之后,两位眼科医生对这两种大语言模型针对一组133个问题给出的答案进行评估,他们按照五点李克特量表对每个答案进行评分。基于六个定性标准对模型进行评估:(C1)反映临床和科学共识,(C2)可能造成伤害的可能性,(C3)正确推理的证据,(C4)正确理解的证据,(C5)正确检索的证据,以及(C6)缺失内容。在133个问题中,ChatGPT4在C1方面,两位评审者都给118个问题(88.72%)打了5分;在C2方面,给130个问题(97.74%)打了5分;在C3方面,给131个问题(98.50%)打了5分;在C4方面,给133个问题(100%)打了5分;在C5方面,给132个问题(99.25%)打了5分;在C6方面,给122个问题(91.73%)打了5分。而PaLM2在C1方面,给81个问题(60.90%)打了5分;在C2方面,给114个问题(85.71%)打了5分;在C3方面,给115个问题(86.47%)打了5分;在C4方面,给124个问题(93.23%)打了5分;在C5方面,给113个问题(84.97%)打了5分;在C6方面,给93个问题(69.92%)打了5分。尽管整体表现出色,但仍有一些答案不完整或不准确,本文探讨了这些大语言模型产生的错误类型。我们的研究表明,ChatGPT4和PaLM2是患者信息和教育的有价值工具;然而,由于这些模型仍存在一些局限性,为了提供准确信息,除了医生提供的建议外,还应使用这些模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/967b9bca201d/diagnostics-14-01468-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/6563e0eda60f/diagnostics-14-01468-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/f1af0f72628c/diagnostics-14-01468-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/a5bfcb9a4ed4/diagnostics-14-01468-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/851e2a263ff2/diagnostics-14-01468-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/b228adba8db0/diagnostics-14-01468-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/967b9bca201d/diagnostics-14-01468-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/6563e0eda60f/diagnostics-14-01468-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/f1af0f72628c/diagnostics-14-01468-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/a5bfcb9a4ed4/diagnostics-14-01468-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/851e2a263ff2/diagnostics-14-01468-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/b228adba8db0/diagnostics-14-01468-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b745/11275354/967b9bca201d/diagnostics-14-01468-g006.jpg

相似文献

1
A Qualitative Evaluation of ChatGPT4 and PaLM2's Response to Patient's Questions Regarding Age-Related Macular Degeneration.对ChatGPT4和PaLM2关于年龄相关性黄斑变性患者问题回答的定性评估
Diagnostics (Basel). 2024 Jul 9;14(14):1468. doi: 10.3390/diagnostics14141468.
2
The evaluation of the performance of ChatGPT in the management of labor analgesia.评估 ChatGPT 在分娩镇痛管理中的性能。
J Clin Anesth. 2024 Nov;98:111582. doi: 10.1016/j.jclinane.2024.111582. Epub 2024 Aug 20.
3
Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia.人工智能大语言模型聊天机器人在回答麻醉常见问题方面的比较。
BJA Open. 2024 May 8;10:100280. doi: 10.1016/j.bjao.2024.100280. eCollection 2024 Jun.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
ChatGPT4's proficiency in addressing patients' questions on systemic lupus erythematosus: a blinded comparative study with specialists.ChatGPT4 在回答系统性红斑狼疮患者问题方面的能力:与专家进行的盲法比较研究。
Rheumatology (Oxford). 2024 Sep 1;63(9):2450-2456. doi: 10.1093/rheumatology/keae238.
6
An Analysis of ChatGPT4 to Respond to Glaucoma-Related Questions.ChatGPT4 分析对青光眼相关问题的响应。
J Glaucoma. 2024 Jul 1;33(7):486-489. doi: 10.1097/IJG.0000000000002408. Epub 2024 Apr 23.
7
Performance of Large Language Models on Medical Oncology Examination Questions.大语言模型在医学肿瘤学考试问题上的表现。
JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.
8
Artificial intelligence and clinical guidance in male reproductive health: ChatGPT4.0's AUA/ASRM guideline compliance evaluation.人工智能与男性生殖健康临床指导:ChatGPT4.0对美国泌尿外科学会/美国生殖医学学会指南的依从性评估
Andrology. 2025 Feb;13(2):176-183. doi: 10.1111/andr.13693. Epub 2024 Jul 17.
9
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
10
Optical coherence tomography for age-related macular degeneration and diabetic macular edema: an evidence-based analysis.光学相干断层扫描在年龄相关性黄斑变性和糖尿病性黄斑水肿中的应用:一项基于证据的分析。
Ont Health Technol Assess Ser. 2009;9(13):1-22. Epub 2009 Sep 1.

引用本文的文献

1
Large language models in the management of chronic ocular diseases: a scoping review.大语言模型在慢性眼病管理中的应用:一项范围综述
Front Cell Dev Biol. 2025 Jun 18;13:1608988. doi: 10.3389/fcell.2025.1608988. eCollection 2025.
2
ChatGPT-4 for addressing patient-centred frequently asked questions in age-related macular degeneration clinical practice.ChatGPT-4用于解决年龄相关性黄斑变性临床实践中以患者为中心的常见问题。
Eye (Lond). 2025 Apr 15. doi: 10.1038/s41433-025-03788-0.

本文引用的文献

1
The Evaluation of Generative AI Should Include Repetition to Assess Stability.生成式 AI 的评估应包括重复以评估稳定性。
JMIR Mhealth Uhealth. 2024 May 6;12:e57978. doi: 10.2196/57978.
2
Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。
JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.
3
Availability of ChatGPT to provide medical information for patients with kidney cancer.ChatGPT 能否为肾癌患者提供医学信息。
Sci Rep. 2024 Jan 17;14(1):1542. doi: 10.1038/s41598-024-51531-8.
4
Pegcetacoplan for the treatment of geographic atrophy secondary to age-related macular degeneration (OAKS and DERBY): two multicentre, randomised, double-masked, sham-controlled, phase 3 trials.培格司他单抗治疗与年龄相关的黄斑变性(OAKS 和 DERBY)相关的地图状萎缩:两项多中心、随机、双盲、假对照、3 期临床试验。
Lancet. 2023 Oct 21;402(10411):1434-1448. doi: 10.1016/S0140-6736(23)01520-9.
5
Evaluating the Application of Large Language Models in Clinical Research Contexts.评估大语言模型在临床研究背景下的应用。
JAMA Netw Open. 2023 Oct 2;6(10):e2335924. doi: 10.1001/jamanetworkopen.2023.35924.
6
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
7
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
8
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
9
Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.评估 GPT 作为放射学决策辅助工具:GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。
J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.
10
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.