• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用人工智能为泌尿科患者生成医学文献:三种不同的大型语言模型比较。

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models.

机构信息

School of Medicine, University College Cork, Cork, Ireland.

Department of Urology, Mercy University Hospital, Cork, Ireland.

出版信息

World J Urol. 2024 Jul 29;42(1):455. doi: 10.1007/s00345-024-05146-3.

DOI:10.1007/s00345-024-05146-3
PMID:39073590
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11286728/
Abstract

PURPOSE

Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics.

METHODS

Prompts were created to generate PILs from 3 LLMs: ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator.

RESULTS

PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14-15 average reading level). Llama 2 PILs were the most difficult (age 16-17 average).

CONCLUSION

While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.

摘要

目的

大型语言模型(LLM)是一种人工智能(AI),它使用深度学习技术来理解、总结和生成内容。预计 LLM 在医疗保健中的潜在益处是巨大的。本研究的目的是检查 3 种 LLM 在泌尿科主题上生成的患者信息传单(PIL)的质量。

方法

为 3 种 LLM(ChatGPT-4、PaLM 2(谷歌 Bard)和 Llama 2(Meta))生成 PIL 创建了提示,涵盖四个泌尿科主题(包皮环切术、肾切除术、膀胱过度活动症和经尿道前列腺切除术)。使用质量评估检查表评估 PIL。使用平均阅读水平共识计算器评估 PIL 的可读性。

结果

PaLM 2 生成的 PIL 总体平均质量评分最高(3.58),其次是 Llama 2(3.34)和 ChatGPT-4(3.08)。除 TURP 外,PaLM 2 生成的 PIL 在所有主题中质量最高,并且是唯一包含图像的 LLM。所有生成的内容都存在医疗不准确之处,包括严重错误的实例。可读性分析确定 PaLM 2 生成的 PIL 最简洁(14-15 岁平均阅读水平)。Llama 2 PIL 最难(16-17 岁平均)。

结论

虽然 LLM 可以生成可能有助于减轻医疗保健专业人员工作量的 PIL,但生成的内容需要临床医生输入以确保准确性,并包含健康素养辅助工具,如图像。LLM 生成的 PIL 高于成人的平均阅读水平,需要改进 LLM 算法和/或提示设计。患者对 LLM 生成的 PIL 的满意度如何仍有待评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc83/11286728/bbf4e4301859/345_2024_5146_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc83/11286728/bbf4e4301859/345_2024_5146_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc83/11286728/bbf4e4301859/345_2024_5146_Fig1_HTML.jpg

相似文献

1
Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models.利用人工智能为泌尿科患者生成医学文献:三种不同的大型语言模型比较。
World J Urol. 2024 Jul 29;42(1):455. doi: 10.1007/s00345-024-05146-3.
2
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
3
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.
4
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
5
Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources.评估人工智能在泌尿内镜患者教育中的现状:ChatGPT 和 Google Bard 与传统信息资源的盲对比。
J Endourol. 2024 Aug;38(8):843-851. doi: 10.1089/end.2023.0696. Epub 2024 May 17.
6
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.
7
Artificial intelligence-generated patient information leaflets: a comparison of contents according to British Association of Dermatologists standards.人工智能生成的患者信息传单:根据英国皮肤科医师协会标准比较内容。
Clin Exp Dermatol. 2024 Jun 25;49(7):711-714. doi: 10.1093/ced/llad461.
8
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生:通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性,探索人工智能在眼科领域的应用。
Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.
9
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士:评估人工智能生成的关于阑尾炎的医学信息的内容和质量。
Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.
10
Large language models: a new frontier in paediatric cataract patient education.大语言模型:小儿白内障患者教育的新前沿。
Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.

引用本文的文献

1
Evaluation of large language models in patient education for hyperthyroidism: A comparative study of chatgpt, gemini, and deepseek.大语言模型在甲状腺功能亢进症患者教育中的评估:ChatGPT、Gemini和DeepSeek的比较研究
Endocrine. 2025 Sep 13. doi: 10.1007/s12020-025-04421-6.
2
Patient consent in the modern era: Novel tools and practical considerations in urology.现代社会中的患者同意:泌尿外科的新工具与实际考量
Curr Urol. 2025 Jul;19(4):235-240. doi: 10.1097/CU9.0000000000000282. Epub 2025 Apr 1.
3
Comparative analysis of the performance of the large language models DeepSeek-V3, DeepSeek-R1, open AI-O3 mini and open AI-O3 mini high in urology.

本文引用的文献

1
Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis.五种不同人工智能聊天机器人对阳痿热搜查询的反应:比较分析。
J Med Syst. 2024 Apr 3;48(1):38. doi: 10.1007/s10916-024-02056-0.
2
Evaluating text-based generative artificial intelligence models for patient information regarding cataract surgery.评估基于文本的生成式人工智能模型在白内障手术患者信息方面的应用。
J Cataract Refract Surg. 2024 Jan 1;50(1):95-96. doi: 10.1097/j.jcrs.0000000000001288.
3
Evaluation of a chat GPT generated patient information leaflet about laparoscopic cholecystectomy.
大语言模型DeepSeek-V3、DeepSeek-R1、open AI-O3 mini和open AI-O3 mini在泌尿外科领域的性能比较分析。
World J Urol. 2025 Jul 7;43(1):416. doi: 10.1007/s00345-025-05757-4.
4
The Impact of Language Variability on Artificial Intelligence Performance in Regenerative Endodontics.语言变异性对再生牙髓病学中人工智能性能的影响。
Healthcare (Basel). 2025 May 20;13(10):1190. doi: 10.3390/healthcare13101190.
5
Artificial intelligence and patient education.人工智能与患者教育。
Curr Opin Urol. 2025 May 1;35(3):219-223. doi: 10.1097/MOU.0000000000001267. Epub 2025 Feb 12.
评估关于腹腔镜胆囊切除术的 Chat GPT 生成的患者信息传单。
ANZ J Surg. 2024 Mar;94(3):353-355. doi: 10.1111/ans.18834. Epub 2023 Dec 22.
4
Accuracy and comprehensibility of chat-based artificial intelligence for patient information on atrial fibrillation and cardiac implantable electronic devices.基于聊天的人工智能在心房颤动和心脏植入式电子设备患者信息方面的准确性和可理解性。
Europace. 2023 Dec 28;26(1). doi: 10.1093/europace/euad369.
5
Information Quality and Readability: ChatGPT's Responses to the Most Common Questions About Spinal Cord Injury.信息质量与可读性:ChatGPT 对脊髓损伤常见问题的回答
World Neurosurg. 2024 Jan;181:e1138-e1144. doi: 10.1016/j.wneu.2023.11.062. Epub 2023 Nov 22.
6
Answer to the letter to the editor of H. Daungsupawong et al. concerning "Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?" by R. Stroop et al. (Eur Spine J [2023]; doi:10.1007/s00586-023-07975-z).对H. Daungsupawong等人致编辑信的回复,该信涉及R. Stroop等人的“大语言模型:基于人工智能的聊天机器人是脊柱手术患者信息的可靠来源吗?”(《欧洲脊柱杂志》[2023];doi:10.1007/s00586-023-07975-z)
Eur Spine J. 2024 Jan;33(1):362. doi: 10.1007/s00586-023-08038-z. Epub 2023 Nov 18.
7
Quality of information and appropriateness of ChatGPT outputs for urology patients.针对泌尿外科患者的ChatGPT输出信息的质量及适宜性。
Prostate Cancer Prostatic Dis. 2024 Mar;27(1):159-160. doi: 10.1038/s41391-023-00754-3. Epub 2023 Nov 3.
8
Comparison of ChatGPT and Traditional Patient Education Materials for Men's Health.比较 ChatGPT 和传统男性健康患者教育材料。
Urol Pract. 2024 Jan;11(1):87-94. doi: 10.1097/UPJ.0000000000000490. Epub 2023 Nov 1.
9
Application of Artificial Intelligence to Patient-Targeted Health Information on Kidney Stone Disease.人工智能在肾结石病患者靶向健康信息中的应用。
J Ren Nutr. 2024 Mar;34(2):170-176. doi: 10.1053/j.jrn.2023.10.002. Epub 2023 Oct 13.
10
Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?大语言模型:基于人工智能的聊天机器人是脊柱手术患者信息的可靠来源吗?
Eur Spine J. 2024 Nov;33(11):4135-4143. doi: 10.1007/s00586-023-07975-z. Epub 2023 Oct 11.