针对泌尿外科患者的ChatGPT输出信息质量及适用性

Quality of information and appropriateness of ChatGPT outputs for urology patients.

作者信息

Cocci Andrea, Pezzoli Marta, Lo Re Mattia, Russo Giorgio Ivan, Asmundo Maria Giovanna, Fode Mikkel, Cacciamani Giovanni, Cimino Sebastiano, Minervini Andrea, Durukan Emil

机构信息

Urology Section, University of Florence, Florence, Italy.

Urology Section, University of Catania, Catania, Italy.

出版信息

Prostate Cancer Prostatic Dis. 2024 Mar;27(1):103-108. doi: 10.1038/s41391-023-00705-y. Epub 2023 Jul 29.

DOI:10.1038/s41391-023-00705-y

PMID:37516804

Abstract

BACKGROUND

The proportion of health-related searches on the internet is continuously growing. ChatGPT, a natural language processing (NLP) tool created by OpenAI, has been gaining increasing user attention and can potentially be used as a source for obtaining information related to health concerns. This study aims to analyze the quality and appropriateness of ChatGPT's responses to Urology case studies compared to those of a urologist.

METHODS

Data from 100 patient case studies, comprising patient demographics, medical history, and urologic complaints, were sequentially inputted into ChatGPT, one by one. A question was posed to determine the most likely diagnosis, suggested examinations, and treatment options. The responses generated by ChatGPT were then compared to those provided by a board-certified urologist who was blinded to ChatGPT's responses and graded on a 5-point Likert scale based on accuracy, comprehensiveness, and clarity as criterias for appropriateness. The quality of information was graded based on the section 2 of the DISCERN tool and readability assessments were performed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL) formulas.

RESULTS

52% of all responses were deemed appropriate. ChatGPT provided more appropriate responses for non-oncology conditions (58.5%) compared to oncology (52.6%) and emergency urology cases (11.1%) (p = 0.03). The median score of the DISCERN tool was 15 (IQR = 5.3) corresponding to a quality score of poor. The ChatGPT responses demonstrated a college graduate reading level, as indicated by the median FRE score of 18 (IQR = 21) and the median FKGL score of 15.8 (IQR = 3).

CONCLUSIONS

ChatGPT serves as an interactive tool for providing medical information online, offering the possibility of enhancing health outcomes and patient satisfaction. Nevertheless, the insufficient appropriateness and poor quality of the responses on Urology cases emphasizes the importance of thorough evaluation and use of NLP-generated outputs when addressing health-related concerns.

摘要

背景

互联网上与健康相关的搜索比例在持续增长。ChatGPT是OpenAI创建的一种自然语言处理（NLP）工具，越来越受到用户关注，并有可能被用作获取与健康问题相关信息的来源。本研究旨在分析ChatGPT对泌尿外科病例研究的回答与泌尿外科医生回答相比的质量和适当性。

方法

将100个患者病例研究的数据，包括患者人口统计学、病史和泌尿外科主诉，逐一依次输入ChatGPT。提出一个问题以确定最可能的诊断、建议的检查和治疗方案。然后将ChatGPT生成的回答与一位对ChatGPT的回答不知情的获得委员会认证的泌尿外科医生提供的回答进行比较，并根据准确性、全面性和清晰度作为适当性标准，采用5点李克特量表进行评分。信息质量根据DISCERN工具的第2节进行评分，并使用弗莱什易读性（FRE）和弗莱什-金凯德阅读年级水平（FKGL）公式进行可读性评估。

结果

所有回答中有52%被认为是适当的。与肿瘤学（52.6%）和泌尿外科急诊病例（11.1%）相比，ChatGPT对非肿瘤疾病提供了更适当的回答（58.5%）（p = 0.03）。DISCERN工具的中位数分数为15（四分位距 = 5.3），对应质量评分为差。ChatGPT的回答显示出大学毕业生的阅读水平，FRE中位数分数为18（四分位距 = 21），FKGL中位数分数为15.8（四分位距 = 3）表明了这一点。

结论

ChatGPT作为一种在线提供医疗信息的交互式工具，为改善健康结果和患者满意度提供了可能性。然而，泌尿外科病例回答的适当性不足和质量较差，强调了在处理与健康相关问题时，对NLP生成的输出进行全面评估和使用的重要性。

相似文献

Quality of information and appropriateness of ChatGPT outputs for urology patients.针对泌尿外科患者的ChatGPT输出信息质量及适用性

Prostate Cancer Prostatic Dis. 2024 Mar;27(1):103-108. doi: 10.1038/s41391-023-00705-y. Epub 2023 Jul 29.

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效：多指标评估。

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology.评估人工智能驱动的大型语言模型在泌尿外科传播恰当且易读的健康信息方面的有效性。

J Urol. 2023 Oct;210(4):688-694. doi: 10.1097/JU.0000000000003615. Epub 2023 Jul 10.

Information Quality and Readability: ChatGPT's Responses to the Most Common Questions About Spinal Cord Injury.信息质量与可读性：ChatGPT 对脊髓损伤常见问题的回答

World Neurosurg. 2024 Jan;181:e1138-e1144. doi: 10.1016/j.wneu.2023.11.062. Epub 2023 Nov 22.

Digesting Digital Health: A Study of Appropriateness and Readability of ChatGPT-Generated Gastroenterological Information.消化数字健康：ChatGPT 生成的胃肠病学信息适宜性和可读性的研究。

Clin Transl Gastroenterol. 2024 Nov 1;15(11):e00765. doi: 10.14309/ctg.0000000000000765.

Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery.评估 ChatGPT 在提供腺样体切除术、扁桃体切除术和通气管插入手术的家长指导方面的准确性和可读性。

Int J Pediatr Otorhinolaryngol. 2024 Jun;181:111998. doi: 10.1016/j.ijporl.2024.111998. Epub 2024 May 31.

BPPV Information on Google Versus AI (ChatGPT).谷歌与人工智能（ChatGPT）上的良性阵发性位置性眩晕信息

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1504-1511. doi: 10.1002/ohn.506. Epub 2023 Aug 25.

Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries.弥合泌尿科研究与患者理解之间的差距：大型语言模型在生成非专业人士摘要方面的作用。

Urol Pract. 2023 Sep;10(5):436-443. doi: 10.1097/UPJ.0000000000000428. Epub 2023 Jul 5.

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.对ChatGPT关于淀粉样变性知识的多学科评估：观察性研究。

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士：评估人工智能生成的关于阑尾炎的医学信息的内容和质量。

Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.

引用本文的文献

Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.将生成式大语言模型应用于电子健康记录的性能及改进策略：一项系统综述

Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.

Readability, Reliability, and Quality Analysis of Internet-Based Patient Education Materials and Large Language Models on Meniere's Disease.基于互联网的梅尼埃病患者教育材料和大语言模型的可读性、可靠性及质量分析

J Otolaryngol Head Neck Surg. 2025 Jan-Dec;54:19160216251360651. doi: 10.1177/19160216251360651. Epub 2025 Aug 8.

Patient consent in the modern era: Novel tools and practical considerations in urology.现代社会中的患者同意：泌尿外科的新工具与实际考量

Curr Urol. 2025 Jul;19(4):235-240. doi: 10.1097/CU9.0000000000000282. Epub 2025 Apr 1.

Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity.人工智能简化信息以促进生殖遗传知识普及和健康公平。

Hum Reprod. 2025 Jul 22. doi: 10.1093/humrep/deaf135.

Large language model integrations in cancer decision-making: a systematic review and meta-analysis.大型语言模型在癌症决策中的应用：一项系统综述和荟萃分析。

NPJ Digit Med. 2025 Jul 17;8(1):450. doi: 10.1038/s41746-025-01824-7.

Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.解答泌尿妇科常见问题：ChatGPT的准确性与局限性

Int Urogynecol J. 2025 Jun 18. doi: 10.1007/s00192-025-06184-0.

Bridging the gap: the role of large language model refinement in readability in urology research.弥合差距：大语言模型优化在泌尿外科研究可读性中的作用

BJU Int. 2025 May 19;136(3):356-8. doi: 10.1111/bju.16774.

Letter to the Editor on "Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians".致编辑的信：关于“泌尿外科中医生与人工智能生成的信息：患者和医生对准确性、完整性及偏好的评估”

World J Urol. 2025 May 6;43(1):272. doi: 10.1007/s00345-025-05587-4.

Making large language models into reliable physician assistants.将大语言模型转变为可靠的医生助手。

Nat Med. 2025 Apr;31(4):1071-1072. doi: 10.1038/s41591-025-03606-w.

The digital dialogue on premature ejaculation: evaluating the efficacy of artificial intelligence-driven responses.早泄的数字对话：评估人工智能驱动回复的疗效

Int Urol Nephrol. 2025 Mar 20. doi: 10.1007/s11255-025-04461-x.

本文引用的文献

Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma.ChatGPT在医学、科学和学术出版领域的机遇与风险：现代普罗米修斯式困境。

Croat Med J. 2023 Feb 28;64(1):1-3. doi: 10.3325/cmj.2023.64.1.

Artificial intelligence for target prostate biopsy outcomes prediction the potential application of fuzzy logic.人工智能在前列腺靶向活检结果预测中的应用——模糊逻辑的潜在应用。

Prostate Cancer Prostatic Dis. 2022 Feb;25(2):359-362. doi: 10.1038/s41391-021-00441-1. Epub 2021 Sep 3.

Can Patients Trust Online Health Information? A Meta-narrative Systematic Review Addressing the Quality of Health Information on the Internet.患者能信任网上的健康信息吗？一项针对互联网健康信息质量的元叙述系统评价。

J Gen Intern Med. 2019 Sep;34(9):1884-1891. doi: 10.1007/s11606-019-05109-0. Epub 2019 Jun 21.

What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet.万维网上与健康相关的搜索的流行程度如何？对互联网上搜索引擎查询进行定性和定量分析。

AMIA Annu Symp Proc. 2003;2003:225-9.

DISCERN: an instrument for judging the quality of written consumer health information on treatment choices.DISCERN：一种用于评判关于治疗选择的书面消费者健康信息质量的工具。

J Epidemiol Community Health. 1999 Feb;53(2):105-11. doi: 10.1136/jech.53.2.105.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

针对泌尿外科患者的ChatGPT输出信息质量及适用性

Quality of information and appropriateness of ChatGPT outputs for urology patients.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献