• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.

机构信息

Department of Urology, Saint George Hospital, Kogarah, Australia.

Faculty of Medicine, The University of New South Wales, Sydney, Australia.

出版信息

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

DOI:10.2196/55939
PMID:39141904
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11358656/
Abstract

BACKGROUND

Artificial intelligence (AI) chatbots, such as ChatGPT, have made significant progress. These chatbots, particularly popular among health care professionals and patients, are transforming patient education and disease experience with personalized information. Accurate, timely patient education is crucial for informed decision-making, especially regarding prostate-specific antigen screening and treatment options. However, the accuracy and reliability of AI chatbots' medical information must be rigorously evaluated. Studies testing ChatGPT's knowledge of prostate cancer are emerging, but there is a need for ongoing evaluation to ensure the quality and safety of information provided to patients.

OBJECTIVE

This study aims to evaluate the quality, accuracy, and readability of ChatGPT-4's responses to common prostate cancer questions posed by patients.

METHODS

Overall, 8 questions were formulated with an inductive approach based on information topics in peer-reviewed literature and Google Trends data. Adapted versions of the Patient Education Materials Assessment Tool for AI (PEMAT-AI), Global Quality Score, and DISCERN-AI tools were used by 4 independent reviewers to assess the quality of the AI responses. The 8 AI outputs were judged by 7 expert urologists, using an assessment framework developed to assess accuracy, safety, appropriateness, actionability, and effectiveness. The AI responses' readability was assessed using established algorithms (Flesch Reading Ease score, Gunning Fog Index, Flesch-Kincaid Grade Level, The Coleman-Liau Index, and Simple Measure of Gobbledygook [SMOG] Index). A brief tool (Reference Assessment AI [REF-AI]) was developed to analyze the references provided by AI outputs, assessing for reference hallucination, relevance, and quality of references.

RESULTS

The PEMAT-AI understandability score was very good (mean 79.44%, SD 10.44%), the DISCERN-AI rating was scored as "good" quality (mean 13.88, SD 0.93), and the Global Quality Score was high (mean 4.46/5, SD 0.50). Natural Language Assessment Tool for AI had pooled mean accuracy of 3.96 (SD 0.91), safety of 4.32 (SD 0.86), appropriateness of 4.45 (SD 0.81), actionability of 4.05 (SD 1.15), and effectiveness of 4.09 (SD 0.98). The readability algorithm consensus was "difficult to read" (Flesch Reading Ease score mean 45.97, SD 8.69; Gunning Fog Index mean 14.55, SD 4.79), averaging an 11th-grade reading level, equivalent to 15- to 17-year-olds (Flesch-Kincaid Grade Level mean 12.12, SD 4.34; The Coleman-Liau Index mean 12.75, SD 1.98; SMOG Index mean 11.06, SD 3.20). REF-AI identified 2 reference hallucinations, while the majority (28/30, 93%) of references appropriately supplemented the text. Most references (26/30, 86%) were from reputable government organizations, while a handful were direct citations from scientific literature.

CONCLUSIONS

Our analysis found that ChatGPT-4 provides generally good responses to common prostate cancer queries, making it a potentially valuable tool for patient education in prostate cancer care. Objective quality assessment tools indicated that the natural language processing outputs were generally reliable and appropriate, but there is room for improvement.

摘要

背景

人工智能(AI)聊天机器人,如 ChatGPT,取得了显著进展。这些聊天机器人在医疗保健专业人员和患者中尤其受欢迎,正在通过个性化信息改变患者教育和疾病体验。准确、及时的患者教育对于知情决策至关重要,特别是在前列腺特异性抗原筛查和治疗选择方面。然而,必须严格评估 AI 聊天机器人的医疗信息的准确性和可靠性。虽然有研究测试 ChatGPT 对前列腺癌的了解,但需要持续评估,以确保提供给患者的信息的质量和安全性。

目的

本研究旨在评估 ChatGPT-4 对患者提出的常见前列腺癌问题的回答的质量、准确性和可读性。

方法

总体而言,根据同行评议文献和 Google Trends 数据中的信息主题,采用归纳法提出了 8 个问题。使用经过修改的人工智能患者教育材料评估工具(PEMAT-AI)、全球质量评分和 DISCERN-AI 工具,由 4 名独立评审员评估 AI 回复的质量。7 名泌尿科专家使用开发的评估框架对 8 个 AI 输出进行评估,以评估准确性、安全性、适当性、可操作性和有效性。使用已建立的算法(Flesch 阅读容易度得分、Gunning Fog 指数、Flesch-Kincaid 等级、Coleman-Liau 指数和简单测量混杂度 [SMOG] 指数)评估 AI 回复的可读性。开发了一个简短的工具(参考评估 AI [REF-AI])来分析 AI 输出提供的参考资料,评估参考资料的幻觉、相关性和质量。

结果

PEMAT-AI 理解得分非常好(平均 79.44%,SD 10.44%),DISCERN-AI 评分评为“良好”质量(平均 13.88,SD 0.93),全球质量评分高(平均 4.46/5,SD 0.50)。自然语言评估 AI 的平均准确率为 3.96(SD 0.91),安全性为 4.32(SD 0.86),适当性为 4.45(SD 0.81),可操作性为 4.05(SD 1.15),有效性为 4.09(SD 0.98)。可读性算法的共识是“难以阅读”(Flesch 阅读容易度得分平均 45.97,SD 8.69;Gunning Fog 指数平均 14.55,SD 4.79),平均阅读水平为 11 年级,相当于 15-17 岁(Flesch-Kincaid 等级平均 12.12,SD 4.34;Coleman-Liau 指数平均 12.75,SD 1.98;SMOG 指数平均 11.06,SD 3.20)。REF-AI 识别出 2 个参考资料幻觉,而大多数(28/30,93%)的参考资料恰当地补充了文本。大多数(26/30,86%)参考资料来自信誉良好的政府组织,而少数则直接引用科学文献。

结论

我们的分析发现,ChatGPT-4 对常见前列腺癌查询提供了一般较好的回复,使其成为前列腺癌护理中患者教育的潜在有价值工具。客观质量评估工具表明,自然语言处理输出通常是可靠和适当的,但仍有改进的空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/6b9edd1f814d/jmir_v26i1e55939_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/c4ba9373a7ad/jmir_v26i1e55939_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/6b02a7647c32/jmir_v26i1e55939_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/481d097c5935/jmir_v26i1e55939_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/6b9edd1f814d/jmir_v26i1e55939_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/c4ba9373a7ad/jmir_v26i1e55939_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/6b02a7647c32/jmir_v26i1e55939_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/481d097c5935/jmir_v26i1e55939_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/199b/11358656/6b9edd1f814d/jmir_v26i1e55939_fig4.jpg

相似文献

1
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
2
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
3
Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.评估人工智能聊天机器人对癌症热门搜索查询的响应
JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.
4
AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model.血管疾病患者的人工智能生成信息:评估ChatGPT人工智能语言模型提供的特定程序信息标准
Cureus. 2023 Nov 30;15(11):e49764. doi: 10.7759/cureus.49764. eCollection 2023 Nov.
5
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量:一项观察性研究。
Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.
6
Accuracy, readability, and understandability of large language models for prostate cancer information to the public.大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。
Prostate Cancer Prostatic Dis. 2024 May 14. doi: 10.1038/s41391-024-00826-y.
7
Empowering patients: how accurate and readable are large language models in renal cancer education.赋能患者:大语言模型在肾癌教育中的准确性和可读性如何。
Front Oncol. 2024 Sep 26;14:1457516. doi: 10.3389/fonc.2024.1457516. eCollection 2024.
8
Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model.提高健康素养:评估经 ChatGPT 大型语言模型修订的患者手册的可读性。
Otolaryngol Head Neck Surg. 2024 Dec;171(6):1751-1757. doi: 10.1002/ohn.927. Epub 2024 Aug 6.
9
Evaluating ChatGPT-4's performance as a digital health advisor for otosclerosis surgery.评估ChatGPT-4作为耳硬化症手术数字健康顾问的表现。
Front Surg. 2024 Jun 5;11:1373843. doi: 10.3389/fsurg.2024.1373843. eCollection 2024.
10
Enhancing Readability of Online Patient-Facing Content: The Role of AI Chatbots in Improving Cancer Information Accessibility.提高在线面向患者内容的可读性:人工智能聊天机器人在改善癌症信息可及性方面的作用。
J Natl Compr Canc Netw. 2024 May 15;22(2 D):e237334. doi: 10.6004/jnccn.2023.7334.

引用本文的文献

1
ChatGPT and human dietitian responses to diet-related questions on an online Q&A platform: A comparative study.ChatGPT与人类营养师在在线问答平台上对饮食相关问题的回答:一项比较研究。
Digit Health. 2025 Aug 21;11:20552076251361381. doi: 10.1177/20552076251361381. eCollection 2025 Jan-Dec.
2
Perceptions of large language models in medical education and clinical practice among pediatric emergency physicians in Saudi Arabia: a multiregional cross-sectional study.沙特阿拉伯儿科急诊医生对大语言模型在医学教育和临床实践中的认知:一项多地区横断面研究。
Front Public Health. 2025 Jul 30;13:1634638. doi: 10.3389/fpubh.2025.1634638. eCollection 2025.
3

本文引用的文献

1
New Frontiers in Health Literacy: Using ChatGPT to Simplify Health Information for People in the Community.健康素养新前沿:利用 ChatGPT 简化社区人群的健康信息。
J Gen Intern Med. 2024 Mar;39(4):573-577. doi: 10.1007/s11606-023-08469-w. Epub 2023 Nov 8.
2
Quality of information and appropriateness of ChatGPT outputs for urology patients.针对泌尿外科患者的ChatGPT输出信息的质量及适宜性。
Prostate Cancer Prostatic Dis. 2024 Mar;27(1):159-160. doi: 10.1038/s41391-023-00754-3. Epub 2023 Nov 3.
3
Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer?
Bridging knowledge gaps in breast cancer prevention: Insights from Ethiopia.
弥合乳腺癌预防方面的知识差距:来自埃塞俄比亚的见解。
World J Clin Oncol. 2025 Jul 24;16(7):106687. doi: 10.5306/wjco.v16.i7.106687.
4
Evaluating large language models as an educational tool for meningioma patients: patient and clinician perspectives.将大语言模型评估为脑膜瘤患者的教育工具:患者和临床医生的观点。
Radiat Oncol. 2025 Jun 14;20(1):101. doi: 10.1186/s13014-025-02671-2.
5
Large language models' capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot.大型语言模型在回答结核病医学问题方面的能力:对ChatGPT、Gemini和Copilot进行测试
Sci Rep. 2025 May 23;15(1):18004. doi: 10.1038/s41598-025-03074-9.
6
[Is the application of digital technologies the game changer for surgical training of the future? A Germany-wide analysis].[数字技术的应用会成为未来外科培训的变革者吗?一项全德范围的分析]
Chirurgie (Heidelb). 2025 May 22. doi: 10.1007/s00104-025-02306-y.
7
Evaluating an AI Chatbot "Prostate Cancer Info" for Providing Quality Prostate Cancer Screening Information: Cross-Sectional Study.评估人工智能聊天机器人“前列腺癌信息”以提供高质量前列腺癌筛查信息:横断面研究。
JMIR Cancer. 2025 May 21;11:e72522. doi: 10.2196/72522.
8
Management of Burns: Multi-Center Assessment Comparing AI Models and Experienced Plastic Surgeons.烧伤管理:比较人工智能模型与经验丰富的整形外科医生的多中心评估
J Clin Med. 2025 Apr 29;14(9):3078. doi: 10.3390/jcm14093078.
9
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
10
Leveraging artificial intelligence chatbots for anemia prevention: A comparative study of ChatGPT-3.5, copilot, and Gemini outputs against Google Search results.利用人工智能聊天机器人预防贫血:ChatGPT-3.5、Copilot和Gemini输出与谷歌搜索结果的对比研究。
PEC Innov. 2025 Apr 1;6:100390. doi: 10.1016/j.pecinn.2025.100390. eCollection 2025 Jun.
人工智能语言模型ChatGPT能否提供关于前列腺癌的准确且高质量的患者信息?
Urology. 2023 Oct;180:35-58. doi: 10.1016/j.urology.2023.05.040. Epub 2023 Jul 4.
4
Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.ChatGPT 提供的医学信息的可靠性:与临床指南和患者信息质量工具的评估。
J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.
5
Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation.评估聊天机器人在回答整形手术常见问题方面的效果:以聚焦隆胸手术的ChatGPT为例的研究
Aesthet Surg J. 2023 Sep 14;43(10):1126-1135. doi: 10.1093/asj/sjad140.
6
Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT.人工智能提供的美容外科建议和咨询:ChatGPT 参与的隆鼻咨询。
Aesthetic Plast Surg. 2023 Oct;47(5):1985-1993. doi: 10.1007/s00266-023-03338-7. Epub 2023 Apr 24.
7
Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information.利用 ChatGPT 评估癌症谣言和误解:人工智能与癌症信息。
JNCI Cancer Spectr. 2023 Mar 1;7(2). doi: 10.1093/jncics/pkad015.
8
Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios.评估 ChatGPT 在医疗保健中的可行性:对多个临床和研究场景的分析。
J Med Syst. 2023 Mar 4;47(1):33. doi: 10.1007/s10916-023-01925-4.
9
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.ChatGPT中的人工幻觉:对科学写作的影响
Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.
10
ChatGPT: the future of discharge summaries?ChatGPT:出院小结的未来?
Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.