• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

聊天机器人与男科学专家对比:对25个临床病例进行测试

Chatbots vs andrologists: Testing 25 clinical cases.

作者信息

Perrot Ophélie, Schirmann Aurelie, Vidart Adrien, Guillot-Tantay Cyrille, Izard Vincent, Lebret Thierry, Boillot Bernard, Mesnard Benoit, Lebacle Cedric, Madec François-Xavier

机构信息

Foch Hospital, Urology department, Suresnes, France.

Foch Hospital, Urology department, Suresnes, France.

出版信息

Fr J Urol. 2024 Jun;34(5):102636. doi: 10.1016/j.fjurol.2024.102636. Epub 2024 Apr 8.

DOI:10.1016/j.fjurol.2024.102636
PMID:38599321
Abstract

OBJECTIVE

AI-derived language models are booming, and their place in medicine is undefined. The aim of our study is to compare responses to andrology clinical cases, between chatbots and andrologists, to assess the reliability of these technologies.

MATERIAL AND METHOD

We analyzed the responses of 32 experts, 18 residents and three chatbots (ChatGPT v3.5, v4 and Bard) to 25 andrology clinical cases. Responses were assessed on a Likert scale ranging from 0 to 2 for each question (0-false response or no response; 1-partially correct response, 2- correct response), on the basis of the latest national or, in the absence of such, international recommendations. We compared the averages obtained for all cases by the different groups.

RESULTS

Experts obtained a higher mean score (m=11/12.4 σ=1.4) than ChatGPT v4 (m=10.7/12.4 σ=2.2, p=0.6475), ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.0062) and Bard (m=7.2/12.4 σ=3.3, p<0.0001). Residents obtained a mean score (m=9.4/12.4 σ=1.7) higher than Bard (m=7.2/12.4 σ=3.3, p=0.0053) but lower than ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.8393) and v4 (m=10.7/12.4 σ=2.2, p=0.0183) and experts (m=11.0/12.4 σ=1.4,p=0.0009). ChatGPT v4 performance (m=10.7 σ=2.2) was better than ChatGPT v3.5 (m=9.5, σ=2.1, p=0.0476) and Bard performance (m=7.2 σ=3.3, p<0.0001).

CONCLUSION

The use of chatbots in medicine could be relevant. More studies are needed to integrate them into clinical practice.

摘要

目的

人工智能驱动的语言模型正在蓬勃发展,它们在医学领域的地位尚不明确。我们研究的目的是比较聊天机器人和男科医生对男科临床病例的回答,以评估这些技术的可靠性。

材料与方法

我们分析了32名专家、18名住院医生和三个聊天机器人(ChatGPT v3.5、v4和Bard)对25个男科临床病例的回答。根据最新的国家指南(如无国家指南,则依据国际指南),对每个问题的回答按照从0到2的李克特量表进行评估(0-错误回答或无回答;1-部分正确回答;2-正确回答)。我们比较了不同组在所有病例上获得的平均分。

结果

专家获得的平均分数(m=11/12.4,σ=1.4)高于ChatGPT v4(m=10.7/12.4,σ=2.2,p=0.6475)、ChatGPT v3.5(m=9.5/12.4,σ=2.1,p=0.0062)和Bard(m=7.2/12.4,σ=3.3,p<0.0001)。住院医生获得的平均分数(m=9.4/12.4,σ=1.7)高于Bard(m=7.2/12.4,σ=3.3,p=0.0053),但低于ChatGPT v3.5(m=9.5/12.4,σ=2.1,p=0.8393)、ChatGPT v4(m=10.7/12.4,σ=2.2,p=0.0183)和专家(m=11.0/12.4,σ=1.4,p=0.0009)。ChatGPT v4的表现(m=10.7,σ=2.2)优于ChatGPT v3.5(m=9.5,σ=2.1,p=0.0476)和Bard的表现(m=7.2,σ=3.3,p<0.0001)。

结论

在医学中使用聊天机器人可能具有重要意义。需要更多研究将它们整合到临床实践中。

相似文献

1
Chatbots vs andrologists: Testing 25 clinical cases.聊天机器人与男科学专家对比:对25个临床病例进行测试
Fr J Urol. 2024 Jun;34(5):102636. doi: 10.1016/j.fjurol.2024.102636. Epub 2024 Apr 8.
2
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现
Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.
3
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
4
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。
Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.
5
THE ABILITY OF ARTIFICIAL INTELLIGENCE CHATBOTS ChatGPT AND GOOGLE BARD TO ACCURATELY CONVEY PREOPERATIVE INFORMATION FOR PATIENTS UNDERGOING OPHTHALMIC SURGERIES.人工智能聊天机器人 ChatGPT 和谷歌巴德准确传达接受眼科手术患者术前信息的能力。
Retina. 2024 Jun 1;44(6):950-953. doi: 10.1097/IAE.0000000000004044.
6
Quantitative Comparison of Chatbots on Common Rhinology Pathologies.常见鼻科学病症的聊天机器人定量比较。
Laryngoscope. 2024 Oct;134(10):4225-4231. doi: 10.1002/lary.31470. Epub 2024 Apr 26.
7
Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard.人工智能聊天机器人作为阻塞性睡眠呼吸暂停患者教育材料的来源:ChatGPT 与 Google Bard 对比。
Eur Arch Otorhinolaryngol. 2024 Feb;281(2):985-993. doi: 10.1007/s00405-023-08319-9. Epub 2023 Nov 2.
8
The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.人工智能大语言模型关联型聊天机器人在胃食管反流病手术决策中的应用。
Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.
9
Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较:ChatGPT、必应聊天和巴德
Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.
10
Comparative analysis of artificial intelligence chatbot recommendations for urolithiasis management: A study of EAU guideline compliance.人工智能聊天机器人对尿石症管理建议的比较分析:一项关于欧洲泌尿外科学会指南依从性的研究
Fr J Urol. 2024 Jul;34(7-8):102666. doi: 10.1016/j.fjurol.2024.102666. Epub 2024 Jun 5.

引用本文的文献

1
Evaluating AI chatbots in penis enhancement information: a comparative analysis of readability, reliability and quality.评估人工智能聊天机器人在阴茎增大信息方面的表现:可读性、可靠性和质量的比较分析。
Int J Impot Res. 2025 Jun 3. doi: 10.1038/s41443-025-01098-3.
2
Artificial Intelligence for Clinical Management of Male Infertility, a Scoping Review.人工智能在男性不育临床管理中的应用:范围综述
Curr Urol Rep. 2024 Nov 9;26(1):17. doi: 10.1007/s11934-024-01239-z.
3
Can CHATGPT provides reliable technical medical information about phimosis?
ChatGPT能提供关于包茎的可靠医学技术信息吗?
Int Braz J Urol. 2024 Sep-Oct;50(5):651-654. doi: 10.1590/S1677-5538.IBJU.2024.9913.