ChatGPT-3.5作为泌尿外科分诊系统的能力。

Capabilities of ChatGPT-3.5 as a Urological Triage System.

作者信息

Hirtsiefer Christopher, Nestler Tim, Eckrich Johanna, Beverungen Henrieke, Siech Carolin, Aksoy Cem, Leitsmann Marianne, Baunacke Martin, Uhlig Annemarie

机构信息

Klinik und Poliklinik für Urologie, Universitätsklinikum Carl Gustav Carus Dresden, Dresden, Germany.

Klinik für Urologie, Bundeswehrzentralrankenhaus Koblenz, Koblenz, Germany.

出版信息

Eur Urol Open Sci. 2024 Nov 1;70:148-153. doi: 10.1016/j.euros.2024.10.015. eCollection 2024 Dec.

DOI:10.1016/j.euros.2024.10.015

PMID:39554303

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11567918/

Abstract

BACKGROUND AND OBJECTIVE

Patients struggle to classify symptoms, which hinders timely medical presentation. With 35-75% of patients seeking information online before consulting a health care professional, generative language-based artificial intelligence (AI), exemplified by ChatGPT-3.5 (GPT-3.5) from OpenAI, has emerged as an important source. The aim of our study was to evaluate the role of GPT-3.5 in triaging acute urological conditions to address a gap in current research.

METHODS

We assessed GPT-3.5 performance in providing urological differential diagnoses (DD) and recommending a course of action (CoA). Six acute urological pathologies were identified for evaluation. Lay descriptions, sourced from patient forums, formed the basis for 472 queries that were independently entered by nine urologists. We evaluated the output in terms of compliance with the European Association of Urology (EAU) guidelines, the quality of the patient information using the validated DISCERN questionnaire, and a linguistic analysis.

KEY FINDINGS AND LIMITATIONS

The median GPT-3.5 ratings were 4/5 for DD and CoA, and 3/5 for overall information quality. English outputs received higher median ratings than German outputs for DD (4.27 vs 3.95; < 0.001) and CoA (4.25 vs 4.05; < 0.005). There was no difference in performance between urgent and non-urgent cases. Analysis of the information quality revealed notable underperformance for source indication, risk assessment, and influence on quality of life.

CONCLUSION AND CLINICAL IMPLICATIONS

Our results highlights the potential of GPT-3.5 as a triage system for offering individualized, empathetic advice mostly aligned with the EAU guidelines, outscoring other online information. Relevant shortcomings in terms of information quality, especially for risk assessment, need to be addressed to enhance the reliability. Broader transparency and quality improvements are needed before integration into, primarily English-speaking, patient care.

PATIENT SUMMARY

We looked at the performance of ChatGPT-3.5 for patients seeking urology advice. We entered more than 400 German and English inputs and assessed the possible diagnoses suggested by this artificial intelligence tool. ChatGPT-3.5 scored well in providing a complete list of possible diagnoses and recommending a course of action mostly in line with current guidelines. The quality of the information was good overall, but missing and unclear sources for the information can be a problem.

摘要

背景与目的

患者在对症状进行分类时存在困难，这阻碍了他们及时就医。35%至75%的患者在咨询医疗保健专业人员之前会在网上搜索信息，以OpenAI的ChatGPT-3.5（GPT-3.5）为代表的基于生成语言的人工智能已成为重要的信息来源。我们研究的目的是评估GPT-3.5在对急性泌尿系统疾病进行分诊中的作用，以填补当前研究的空白。

方法

我们评估了GPT-3.5在提供泌尿系统鉴别诊断（DD）和推荐行动方案（CoA）方面的表现。确定了六种急性泌尿系统疾病进行评估。从患者论坛获取的非专业描述构成了九位泌尿科医生独立输入的472个问题的基础。我们根据是否符合欧洲泌尿外科学会（EAU）指南、使用经过验证的DISCERN问卷评估患者信息质量以及进行语言分析来评估输出结果。

主要发现与局限性

GPT-3.5在DD和CoA方面的中位数评分分别为4/5，整体信息质量评分为3/5。对于DD（4.27对3.95；<0.001）和CoA（4.25对4.05；<0.005），英文输出的中位数评分高于德文输出。紧急和非紧急病例的表现没有差异。对信息质量的分析显示，在信息来源指示、风险评估以及对生活质量的影响方面表现明显不佳。

结论与临床意义

我们的结果突出了GPT-3.5作为分诊系统的潜力，它能提供与EAU指南基本一致的个性化、有同理心的建议，得分高于其他在线信息。在信息质量方面，尤其是风险评估方面的相关不足需要解决，以提高可靠性。在整合到主要是说英语的患者护理中之前，需要更广泛的透明度和质量改进。

患者总结

我们研究了ChatGPT-3.5在为寻求泌尿科建议的患者提供服务时的表现。我们输入了400多个德文和英文内容，并评估了这个人工智能工具给出的可能诊断。ChatGPT-3.5在提供完整的可能诊断列表以及推荐基本符合当前指南的行动方案方面得分较高。总体而言，信息质量良好，但信息来源缺失和不明确可能是个问题。