• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT作为患者前列腺癌医学信息可靠来源的情况:肿瘤内科医生和泌尿科医生的全球比较调查。

Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.

作者信息

Stenzl Arnulf, Armstrong Andrew J, Rogers Eamonn, Habr Dany, Walz Jochen, Gleave Martin, Sboner Andrea, Ghith Jennifer, Serfass Lucile, Schuler Kristine W, Garas Sam, Chari Dheepa, Truman Ken, Sternberg Cora N

机构信息

Department of Urology, University of Tübingen, Tübingen, Germany.

Division of Medical Oncology, Department of Medicine, Duke Cancer Institute Center for Prostate and Urologic Cancer, Durham, North Carolina.

出版信息

Urol Pract. 2025 Mar;12(2):229-240. doi: 10.1097/UPJ.0000000000000740. Epub 2024 Nov 7.

DOI:10.1097/UPJ.0000000000000740
PMID:39509585
Abstract

INTRODUCTION

No consensus exists on performance standards for evaluation of generative artificial intelligence (AI) to generate medical responses. The purpose of this study was the assessment of Chat Generative Pre-trained Transformer (ChatGPT) to address medical questions in prostate cancer.

METHODS

A global online survey was conducted from April to June 2023 among > 700 medical oncologists or urologists who treat patients with prostate cancer. Participants were unaware that this was a survey evaluating AI. In component 1, responses to 9 questions were written independently by medical writers (MWs; from medical websites) and ChatGPT 4.0 (AI-generated from publicly available information). Respondents were randomly exposed and blinded to both AI-generated and MW-curated responses; evaluation criteria and overall preference were recorded. Exploratory component 2 evaluated AI-generated responses to 5 complex questions with nuanced answers in the medical literature. Responses were evaluated on a 5-point Likert scale. Statistical significance was denoted by < .05.

RESULTS

In component 1, respondents (N = 602) consistently preferred the clarity of AI-generated responses over MW-curated responses in 7 of 9 questions ( < .05). Despite favoring AI-generated responses when blinded to questions/answers, respondents considered medical websites a more credible source (52%-67%) than ChatGPT (14%). Respondents in component 2 (N = 98) also considered medical websites more credible than ChatGPT, but rated AI-generated responses highly for all evaluation criteria, despite nuanced answers in the medical literature.

CONCLUSIONS

These findings provide insight into how clinicians rate AI-generated and MW-curated responses with evaluation criteria that can be used in future AI validation studies.

摘要

引言

在评估生成式人工智能(AI)以生成医学回答的性能标准方面尚无共识。本研究的目的是评估聊天生成预训练变换器(ChatGPT)对前列腺癌医学问题的解答能力。

方法

2023年4月至6月,对700多名治疗前列腺癌患者的医学肿瘤学家或泌尿科医生进行了一项全球在线调查。参与者不知道这是一项评估人工智能的调查。在第一部分中,医学撰写人员(MWs;来自医学网站)和ChatGPT 4.0(根据公开信息生成的AI)分别独立撰写对9个问题的回答。受访者被随机展示并对AI生成的回答和MW整理的回答进行盲评;记录评估标准和总体偏好。探索性的第二部分评估了AI对医学文献中5个有细微差别的复杂问题的回答。回答根据5级李克特量表进行评估。统计学显著性以P <.05表示。

结果

在第一部分中,受访者(N = 602)在9个问题中的7个问题上一致认为AI生成的回答比MW整理的回答更清晰(P <.05)。尽管在对问题/答案进行盲评时倾向于AI生成的回答,但受访者认为医学网站(52%-67%)比ChatGPT(14%)更可靠。第二部分的受访者(N = 98)也认为医学网站比ChatGPT更可靠,但尽管医学文献中有细微差别,他们对AI生成的回答在所有评估标准上的评分都很高。

结论

这些发现为临床医生如何根据评估标准对AI生成的回答和MW整理的回答进行评分提供了见解,这些标准可用于未来的AI验证研究。

相似文献

1
Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.评估ChatGPT作为患者前列腺癌医学信息可靠来源的情况:肿瘤内科医生和泌尿科医生的全球比较调查。
Urol Pract. 2025 Mar;12(2):229-240. doi: 10.1097/UPJ.0000000000000740. Epub 2024 Nov 7.
2
Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources.评估人工智能在泌尿内镜患者教育中的现状:ChatGPT 和 Google Bard 与传统信息资源的盲对比。
J Endourol. 2024 Aug;38(8):843-851. doi: 10.1089/end.2023.0696. Epub 2024 May 17.
3
Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians.泌尿外科中医生与人工智能生成的信息对比:患者和医生对准确性、完整性及偏好的评估
World J Urol. 2024 Dec 27;43(1):48. doi: 10.1007/s00345-024-05399-y.
4
Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information.患者和整形外科医生都更喜欢人工智能生成的显微手术信息。
J Reconstr Microsurg. 2024 Nov;40(9):657-664. doi: 10.1055/a-2273-4163. Epub 2024 Feb 21.
5
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现
Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.
6
Use and Application of Large Language Models for Patient Questions Following Total Knee Arthroplasty.全膝关节置换术后患者问题的大语言模型应用与实践
J Arthroplasty. 2024 Sep;39(9):2289-2294. doi: 10.1016/j.arth.2024.03.017. Epub 2024 Mar 13.
7
Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.评估ChatGPT以测试其作为放射肿瘤学交互式信息数据库的稳健性,并评估其对放疗患者常见问题的回答:一项单机构调查。
Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.
8
Evaluating ChatGPT as a patient resource for frequently asked questions about lung cancer surgery-a pilot study.评估ChatGPT作为肺癌手术常见问题患者资源的可行性——一项试点研究。
J Thorac Cardiovasc Surg. 2025 Apr;169(4):1174-1180.e18. doi: 10.1016/j.jtcvs.2024.09.030. Epub 2024 Sep 24.
9
Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.探索人工智能在神经外科培训中的应用:ChatGPT参加神经外科住院医师笔试。
Brain Spine. 2023 Nov 29;4:102715. doi: 10.1016/j.bas.2023.102715. eCollection 2024.
10
Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study.评估 ChatGPT 与标准医学资源在经内镜袖状胃切除术教育中的作用:一项医学专业人员评估研究。
Obes Surg. 2024 Jul;34(7):2718-2724. doi: 10.1007/s11695-024-07283-5. Epub 2024 May 17.