大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。

Accuracy, readability, and understandability of large language models for prostate cancer information to the public.

作者信息

Hershenhouse Jacob S, Mokhtar Daniel, Eppler Michael B, Rodler Severin, Storino Ramacciotti Lorenzo, Ganjavi Conner, Hom Brian, Davis Ryan J, Tran John, Russo Giorgio Ivan, Cocci Andrea, Abreu Andre, Gill Inderbir, Desai Mihir, Cacciamani Giovanni E

机构信息

USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.

出版信息

Prostate Cancer Prostatic Dis. 2024 May 14. doi: 10.1038/s41391-024-00826-y.

DOI:10.1038/s41391-024-00826-y

PMID:38744934

Abstract

BACKGROUND

Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.

METHODS

Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.

RESULTS

GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).

CONCLUSION

GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.

摘要

背景

自ChatGPT公开发布以来，生成式预训练模型（GPT）聊天机器人广受欢迎。已有研究评估了不同GPT模型提供有关医疗状况信息的能力。迄今为止，尚无研究从医生和公众的角度评估ChatGPT对前列腺癌相关问题的回答质量，同时优化回答以便患者理解。

方法

通过谷歌趋势（全球）确定了九个与前列腺癌相关的问题，分为诊断、治疗和术后随访三类。使用ChatGPT 3.5处理这些问题，并记录回答。随后，将这些回答重新输入ChatGPT以创建六年级水平可理解的简化摘要。使用经过验证的可读性工具评估原始ChatGPT回答和外行人摘要的可读性。对泌尿外科医生（泌尿外科医生和实习泌尿外科医生）进行了一项调查，使用5点李克特量表对原始ChatGPT回答的准确性、完整性和清晰度进行评分。此外，两名独立评审员根据正确性三要素（准确性、完整性和决策充分性）对外行人摘要进行评估。通过亚马逊土耳其机器人（MTurk）对简化摘要的清晰度和可理解性进行公众评估。参与者对清晰度进行评分，并通过多项选择题展示他们的理解。

结果

在9种情况下，71.7%至94.3%的评分者（36名泌尿外科医生、17名泌尿外科住院医生）认为GPT生成的回答正确。该输出由GPT生成的简化外行人摘要在9种情况中的8种（88.9%）被评为准确，在9种情况中的8种（88.9%）足以让患者做出决策。外行人摘要的平均可读性高于原始GPT输出（[原始ChatGPT与简化ChatGPT，平均值（标准差），p值]弗莱什易读性：36.5（9.1）对70.2（11.2），<0.0001；冈宁雾度：15.8（1.7）对9.5（2.0），p<0.0001；弗莱什年级水平：12.8（1.2）对7.4（1.7），p<0.0001；科尔曼-廖指数：13.7（2.1）对8.6（2.4），0.0002；烟雾指数：11.8（1.2）对6.7（1.8），<0.0001；自动可读性指数：13.1（1.4）对7.

相似文献

Accuracy, readability, and understandability of large language models for prostate cancer information to the public.大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。

Prostate Cancer Prostatic Dis. 2024 May 14. doi: 10.1038/s41391-024-00826-y.

Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries.弥合泌尿科研究与患者理解之间的差距：大型语言模型在生成非专业人士摘要方面的作用。

Urol Pract. 2023 Sep;10(5):436-443. doi: 10.1097/UPJ.0000000000000428. Epub 2023 Jul 5.

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效：多指标评估。

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.人工智能作为一种提高神经外科文献对患者可读性的方式。

J Neurosurg. 2024 Nov 8;142(4):1189-1195. doi: 10.3171/2024.6.JNS24617. Print 2025 Apr 1.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。

Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性：与GPT-3.5和GPT-4的比较研究

JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.

Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量：一项比较评估。

Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

引用本文的文献

Evaluating artificial intelligence chatbots' responses to gynecomastia inquiries: Comparative study of information quality, readability, and guideline consistency.评估人工智能聊天机器人对男性乳房发育症咨询的回复：信息质量、可读性和指南一致性的比较研究

Digit Health. 2025 Aug 26;11:20552076251367645. doi: 10.1177/20552076251367645. eCollection 2025 Jan-Dec.

The Use of Artificial Intelligence in Urologic Oncology: Current Insights and Challenges.人工智能在泌尿外科肿瘤学中的应用：当前见解与挑战

Res Rep Urol. 2025 Aug 21;17:293-308. doi: 10.2147/RRU.S526184. eCollection 2025.

[Digital urology : Possible uses for artificial intelligence and digital health applications].[数字泌尿学：人工智能和数字健康应用的可能用途]

Urologie. 2025 Jul 14. doi: 10.1007/s00120-025-02651-0.

Artificial intelligence in prostate cancer.前列腺癌中的人工智能

Chin Med J (Engl). 2025 Aug 5;138(15):1769-1782. doi: 10.1097/CM9.0000000000003689. Epub 2025 Jul 9.

GPT-4 generates accurate and readable patient education materials aligned with current oncological guidelines: A randomized assessment.GPT-4生成符合当前肿瘤学指南的准确且易读的患者教育材料：一项随机评估。

PLoS One. 2025 Jun 4;20(6):e0324175. doi: 10.1371/journal.pone.0324175. eCollection 2025.

JSES Int. 2024 Nov 29;9(2):390-397. doi: 10.1016/j.jseint.2024.11.012. eCollection 2025 Mar.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Evaluating interactions of patients with large language models for medical information.评估患者与大型语言模型在获取医学信息方面的交互情况。

BJU Int. 2025 Jun;135(6):1010-1017. doi: 10.1111/bju.16676. Epub 2025 Feb 18.

Artificial intelligence and patient education.人工智能与患者教育。

Curr Opin Urol. 2025 May 1;35(3):219-223. doi: 10.1097/MOU.0000000000001267. Epub 2025 Feb 12.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

本文引用的文献

Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答

Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.

ChatGPT: Is This Patient Education Tool for Urological Malignancies Readable for the General Population?ChatGPT：这种用于泌尿系统恶性肿瘤的患者教育工具普通大众能读懂吗？

Res Rep Urol. 2024 Jan 16;16:31-37. doi: 10.2147/RRU.S440633. eCollection 2024.

Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis.评估不同大型语言模型在泌尿系结石健康咨询和患者教育中的表现。

J Med Syst. 2023 Nov 24;47(1):125. doi: 10.1007/s10916-023-02021-3.

ChatGPT in urology practice: revolutionizing efficiency and patient care with generative artificial intelligence.ChatGPT 在泌尿外科实践中的应用：生成式人工智能正在改变效率和患者护理。

Curr Opin Urol. 2024 Mar 1;34(2):98-104. doi: 10.1097/MOU.0000000000001151. Epub 2023 Nov 14.

Reporting standards for the use of large language model-linked chatbots for health advice.使用与大语言模型相关的聊天机器人提供健康建议的报告标准。

Nat Med. 2023 Dec;29(12):2988. doi: 10.1038/s41591-023-02656-2.

Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology.泌尿外科中关于 ChatGPT 和大型语言模型的认知和使用：一项前瞻性的全球横断面调查。

Eur Urol. 2024 Feb;85(2):146-153. doi: 10.1016/j.eururo.2023.10.014. Epub 2023 Nov 4.

Comparison of ChatGPT and Traditional Patient Education Materials for Men's Health.比较 ChatGPT 和传统男性健康患者教育材料。

Urol Pract. 2024 Jan;11(1):87-94. doi: 10.1097/UPJ.0000000000000490. Epub 2023 Nov 1.

Generative Artificial Intelligence in Health Care.医疗保健中的生成式人工智能

J Urol. 2023 Nov;210(5):723-725. doi: 10.1097/JU.0000000000003703. Epub 2023 Sep 27.

Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology.拓展视野并应对挑战以优化临床工作流程：泌尿外科中的ChatGPT

Front Surg. 2023 Sep 7;10:1257191. doi: 10.3389/fsurg.2023.1257191. eCollection 2023.

Fabrication and errors in the bibliographic citations generated by ChatGPT.ChatGPT生成的文献引用中的编造与错误。

Sci Rep. 2023 Sep 7;13(1):14045. doi: 10.1038/s41598-023-41032-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。

Accuracy, readability, and understandability of large language models for prostate cancer information to the public.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献