• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型生成的肾结石患者信息材料与官方泌尿科组织相比的准确性和可读性。

Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations.

机构信息

Department of Urology, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

Department of Urological Sciences, University of British Columbia, Stone Centre at Vancouver General Hospital, Vancouver, British Columbia, Canada.

出版信息

Urology. 2024 Apr;186:107-113. doi: 10.1016/j.urology.2023.11.042. Epub 2024 Feb 21.

DOI:10.1016/j.urology.2023.11.042
PMID:38395071
Abstract

OBJECTIVE

To compare the readability and accuracy of large language model generated patient information materials (PIMs) to those supplied by the American Urological Association (AUA), Canadian Urological Association (CUA), and European Association of Urology (EAU) for kidney stones.

METHODS

PIMs from AUA, CUA, and EAU related to nephrolithiasis were obtained and categorized. The most frequent patient questions related to kidney stones were identified from an internet query and input into GPT-3.5 and GPT-4. PIMs and ChatGPT outputs were assessed for accuracy and readability using previously published indexes. We also assessed changes in ChatGPT outputs when a reading level was specified (grade 6).

RESULTS

Readability scores were better for PIMs from the CUA (grade level 10-12), AUA (8-10), or EAU (9-11) compared to the chatbot. GPT-3.5 had the worst readability scores at grade 13-14 and GPT-4 was likewise less readable than urologic organization PIMs with scores of 11-13. While organizational PIMs were deemed to be accurate, the chatbot had high accuracy with minor details omitted. GPT-4 was more accurate in general stone information, dietary and medical management of kidney stones topics in comparison to GPT-3.5, while both models had the same accuracy in the surgical management of nephrolithiasis topics.

CONCLUSION

Current PIMs from major urologic organizations for kidney stones remain more readable than publicly available GPT outputs, but they are still higher than the reading ability of the general population. Of the available PIMs for kidney stones, those from the AUA are the most readable. Although Chatbot outputs for common kidney stone patient queries have a high degree of accuracy with minor omitted details, it is important for clinicians to understand their strengths and limitations.

摘要

目的

比较大型语言模型生成的患者信息材料(PIMs)与美国泌尿外科学会(AUA)、加拿大泌尿外科学会(CUA)和欧洲泌尿外科学会(EAU)提供的关于肾结石的 PIM 的可读性和准确性。

方法

获取并分类了 AUA、CUA 和 EAU 与肾结石相关的 PIM。从互联网查询中确定了与肾结石最相关的患者常见问题,并将其输入到 GPT-3.5 和 GPT-4 中。使用先前发表的指标评估 PIM 和 ChatGPT 输出的准确性和可读性。我们还评估了当指定阅读水平(等级 6)时 ChatGPT 输出的变化。

结果

与聊天机器人相比,CUA(等级 10-12)、AUA(8-10)或 EAU(9-11)的 PIM 具有更好的可读性评分。GPT-3.5 的可读性得分最差,为 13-14 级,而 GPT-4 的可读性也不如泌尿科组织的 PIM,得分在 11-13 级。虽然组织的 PIM 被认为是准确的,但聊天机器人在省略了一些细节的情况下仍具有很高的准确性。与 GPT-3.5 相比,GPT-4 在一般结石信息、肾结石的饮食和药物管理方面更为准确,而这两种模型在肾结石的手术治疗方面具有相同的准确性。

结论

目前,主要泌尿科组织针对肾结石的 PIM 仍然比可公开获取的 GPT 输出更具可读性,但仍高于一般人群的阅读能力。在肾结石的可用 PIM 中,AUA 的最具可读性。虽然聊天机器人输出对常见肾结石患者查询具有高度准确性,但忽略了一些细节,临床医生了解其优缺点很重要。

相似文献

1
Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations.大型语言模型生成的肾结石患者信息材料与官方泌尿科组织相比的准确性和可读性。
Urology. 2024 Apr;186:107-113. doi: 10.1016/j.urology.2023.11.042. Epub 2024 Feb 21.
2
Empowering patients: how accurate and readable are large language models in renal cancer education.赋能患者:大语言模型在肾癌教育中的准确性和可读性如何。
Front Oncol. 2024 Sep 26;14:1457516. doi: 10.3389/fonc.2024.1457516. eCollection 2024.
3
GPT-4 generates accurate and readable patient education materials aligned with current oncological guidelines: A randomized assessment.GPT-4生成符合当前肿瘤学指南的准确且易读的患者教育材料:一项随机评估。
PLoS One. 2025 Jun 4;20(6):e0324175. doi: 10.1371/journal.pone.0324175. eCollection 2025.
4
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性:与GPT-3.5和GPT-4的比较研究
JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.
5
Accuracy, readability, and understandability of large language models for prostate cancer information to the public.大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。
Prostate Cancer Prostatic Dis. 2024 May 14. doi: 10.1038/s41391-024-00826-y.
6
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
7
Editorial Comment on "Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations".关于“与官方泌尿外科组织相比,大语言模型生成的肾结石患者信息材料的准确性和可读性”的编辑评论
Urology. 2024 Apr;186:114-115. doi: 10.1016/j.urology.2024.02.017. Epub 2024 Feb 22.
8
Readability Assessment of Patient Education Materials on Uro-oncological Diseases Using Automated Measures.使用自动化方法对泌尿肿瘤疾病患者教育材料的可读性评估
Eur Urol Focus. 2024 Dec;10(6):1055-1061. doi: 10.1016/j.euf.2024.06.012. Epub 2024 Jul 23.
9
The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.大语言模型生成早产儿视网膜病变患者信息材料的能力:可读性、准确性和全面性评估
Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.
10
Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations.与官方泌尿外科组织相比,大语言模型生成的肾结石患者信息材料的准确性和可读性。
Urology. 2024 Jul;189:e12. doi: 10.1016/j.urology.2024.03.029. Epub 2024 Apr 21.

引用本文的文献

1
A cross-language analysis of urolithiasis patient online materials: Assessment across 24 European languages.尿石症患者在线资料的跨语言分析:对24种欧洲语言的评估
Cent European J Urol. 2025;78(2):221-227. doi: 10.5173/ceju.2025.0045. Epub 2025 May 26.
2
Generative AI/LLMs for Plain Language Medical Information for Patients, Caregivers and General Public: Opportunities, Risks and Ethics.用于为患者、护理人员和普通公众提供通俗易懂的医学信息的生成式人工智能/大型语言模型:机遇、风险与伦理
Patient Prefer Adherence. 2025 Jul 31;19:2227-2249. doi: 10.2147/PPA.S527922. eCollection 2025.
3
What is the role of large language models in the management of urolithiasis?: a review.
大语言模型在尿石症管理中的作用是什么?:一项综述。
Urolithiasis. 2025 May 15;53(1):92. doi: 10.1007/s00240-025-01761-w.
4
Comparison of artificial intelligence-generated and physician-generated patient education materials on early diabetic kidney disease.人工智能生成与医生生成的早期糖尿病肾病患者教育材料的比较
Front Endocrinol (Lausanne). 2025 Apr 22;16:1559265. doi: 10.3389/fendo.2025.1559265. eCollection 2025.
5
Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型:医学应用的范围综述
Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.
6
Empowering patients: how accurate and readable are large language models in renal cancer education.赋能患者:大语言模型在肾癌教育中的准确性和可读性如何。
Front Oncol. 2024 Sep 26;14:1457516. doi: 10.3389/fonc.2024.1457516. eCollection 2024.