Ayoub Noel F, Lee Yu-Jin, Grimm David, Divi Vasu
Department of Otolaryngology-Head and Neck Surgery, Division of Head & Neck Surgery, Stanford University School of Medicine, Stanford, California, USA.
Otolaryngol Head Neck Surg. 2024 Jun;170(6):1484-1491. doi: 10.1002/ohn.465. Epub 2023 Aug 2.
OBJECTIVE: Chat Generative Pretrained Transformer (ChatGPT) is the newest iteration of OpenAI's generative artificial intelligence (AI) with the potential to influence many facets of life, including health care. This study sought to assess ChatGPT's capabilities as a source of medical knowledge, using Google Search as a comparison. STUDY DESIGN: Cross-sectional analysis. SETTING: Online using ChatGPT, Google Seach, and Clinical Practice Guidelines (CPG). METHODS: CPG Plain Language Summaries for 6 conditions were obtained. Questions relevant to specific conditions were developed and input into ChatGPT and Google Search. All questions were written from the patient perspective and sought (1) general medical knowledge or (2) medical recommendations, with varying levels of acuity (urgent or emergent vs routine clinical scenarios). Two blinded reviewers scored all passages and compared results from ChatGPT and Google Search, using the Patient Education Material Assessment Tool (PEMAT-P) as the primary outcome. Additional customized questions were developed that assessed the medical content of the passages. RESULTS: The overall average PEMAT-P score for medical advice was 68.2% (standard deviation [SD]: 4.4) for ChatGPT and 89.4% (SD: 5.9) for Google Search (p < .001). There was a statistically significant difference in the PEMAT-P score by source (p < .001) but not by urgency of the clinical situation (p = .613). ChatGPT scored significantly higher than Google Search (87% vs 78%, p = .012) for patient education questions. CONCLUSION: ChatGPT fared better than Google Search when offering general medical knowledge, but it scored worse when providing medical recommendations. Health care providers should strive to understand the potential benefits and ramifications of generative AI to guide patients appropriately.
目的:聊天生成预训练变换器(ChatGPT)是OpenAI生成式人工智能(AI)的最新版本,有可能影响生活的许多方面,包括医疗保健。本研究旨在将ChatGPT作为医学知识来源的能力与谷歌搜索进行比较评估。 研究设计:横断面分析。 研究地点:在线使用ChatGPT、谷歌搜索和临床实践指南(CPG)。 方法:获取6种病症的CPG简明语言摘要。针对特定病症编写相关问题,并输入ChatGPT和谷歌搜索。所有问题均从患者角度提出,寻求(1)一般医学知识或(2)医疗建议,涵盖不同紧急程度(紧急或急诊与常规临床情况)。两名盲法评审员对所有段落进行评分,并使用患者教育材料评估工具(PEMAT-P)作为主要结果,比较ChatGPT和谷歌搜索的结果。还开发了额外的定制问题来评估段落的医学内容。 结果:ChatGPT给出的医疗建议的PEMAT-P总体平均得分为68.2%(标准差[SD]:4.4),谷歌搜索为89.4%(SD:5.9)(p < 0.001)。按来源划分,PEMAT-P得分存在统计学显著差异(p < 0.001),但按临床情况的紧急程度划分则无差异(p = 0.613)。对于患者教育问题,ChatGPT的得分显著高于谷歌搜索(87%对78%,p = 0.012)。 结论:在提供一般医学知识方面,ChatGPT的表现优于谷歌搜索,但在提供医疗建议时得分较低。医疗保健提供者应努力了解生成式AI的潜在益处和影响,以便恰当地指导患者。
Otolaryngol Head Neck Surg. 2024-6
Eur Arch Otorhinolaryngol. 2024-6
Otolaryngol Head Neck Surg. 2024-6
Diagnostics (Basel). 2025-8-11
Patient Prefer Adherence. 2025-7-31
NPJ Digit Med. 2025-5-8
Health Promot J Austr. 2025-4
Commun Med (Lond). 2025-1-21