• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能之战:针对尿失禁相关问题对DeepSeek和ChatGPT的全面比较分析

Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions.

作者信息

Cao Huawei, Hao Changzhen, Zhang Tao, Zheng Xiang, Gao Zihao, Wu Jiyue, Gan Lijian, Liu Yu, Zeng Xiangjun, Wang Wei

机构信息

Department of Urology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China.

Department of Urology, Peking University International Hospital, Beijing, China.

出版信息

Front Public Health. 2025 Jul 23;13:1605908. doi: 10.3389/fpubh.2025.1605908. eCollection 2025.

DOI:10.3389/fpubh.2025.1605908
PMID:40771241
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12325333/
Abstract

BACKGROUND

With the rapid advancement and widespread adoption of artificial intelligence (AI), patients increasingly turn to AI for initial medical guidance. Therefore, a comprehensive evaluation of AI-generated responses is warranted. This study aimed to compare the performance of DeepSeek and ChatGPT in answering urinary incontinence-related questions and to delineate their respective strengths and limitations.

METHODS

Based on the American Urological Association/Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction (AUA/SUFU) and European Association of Urology (EAU) guidelines, we designed 25 urinary incontinence-related questions. Responses from DeepSeek and ChatGPT-4.0 were evaluated for reliability, quality, and readability. Fleiss' kappa was employed to calculate inter-rater reliability. For clinical case scenarios, we additionally assessed the appropriateness of responses. A comprehensive comparative analysis was performed.

RESULTS

The modified DISCERN (mDISCERN) scores for DeepSeek and ChatGPT-4.0 were 28.24 ± 0.88 and 28.76 ± 1.56, respectively, showing no practically meaningful difference [ = 0.188, Cohen's = 0.41 (95% : -0.15, 0.97)]. Both AI chatbots rarely provided source references. In terms of quality, DeepSeek achieved a higher mean Global Quality Scale (GQS) score than ChatGPT-4.0 (4.76 ± 0.52 vs. 4.32 ± 0.69, = 0.001). DeepSeek also demonstrated superior readability, as indicated by a higher Flesch Reading Ease (FRE) score (76.43 ± 10.90 vs. 70.95 ± 11.16, = 0.039) and a lower Simple Measure of Gobbledygook (SMOG) index (12.26 ± 1.39 vs. 14.21 ± 1.88, < 0.001), suggesting easier comprehension. Regarding guideline adherence, DeepSeek had 11 (73.33%) fully compliant responses, while ChatGPT-4.0 had 13 (86.67%), with no significant difference [ = 0.651, Cohen's = 0.083 (95% CI: 0.021, 0.232)].

CONCLUSION

DeepSeek and ChatGPT-4.0 might exhibit comparable reliability in answering urinary incontinence-related questions, though both lacked sufficient references. However, DeepSeek outperformed ChatGPT-4.0 in response quality and readability. While both AI chatbots largely adhered to clinical guidelines, occasional deviations were observed. Further refinements are necessary before the widespread clinical implementation of AI chatbots in urology.

摘要

背景

随着人工智能(AI)的迅速发展和广泛应用,患者越来越多地向AI寻求初步医疗指导。因此,对AI生成的回答进行全面评估是必要的。本研究旨在比较DeepSeek和ChatGPT在回答尿失禁相关问题方面的表现,并明确它们各自的优势和局限性。

方法

基于美国泌尿外科学会/尿动力学、女性盆底医学与泌尿生殖重建学会(AUA/SUFU)以及欧洲泌尿外科学会(EAU)的指南,我们设计了25个尿失禁相关问题。对DeepSeek和ChatGPT-4.0的回答进行可靠性、质量和可读性评估。采用Fleiss' kappa计算评分者间信度。对于临床病例场景,我们还评估了回答的适当性。进行了全面的比较分析。

结果

DeepSeek和ChatGPT-4.0的改良DISCERN(mDISCERN)评分分别为28.24±0.88和28.76±1.56,显示出实际上无显著差异[P = 0.188,Cohen's d = 0.41(95% CI:-0.15,0.97)]。两个AI聊天机器人都很少提供参考文献。在质量方面,DeepSeek的平均全球质量量表(GQS)得分高于ChatGPT-4.0(4.76±0.52对4.32±0.69,P = 0.001)。DeepSeek还表现出更好的可读性,其弗莱什易读性(FRE)得分更高(76.43±10.90对70.95±11.16,P = 0.039),而简化的晦涩难懂度量(SMOG)指数更低(12.26±1.39对14.21±1.88,P < 0.001),表明更容易理解。在遵循指南方面,DeepSeek有11个(73.33%)完全符合的回答,而ChatGPT-4.0有13个(86.67%),无显著差异[P = 0.651,Cohen's d = 0.083(95% CI:0.021,0.232)]。

结论

DeepSeek和ChatGPT-4.0在回答尿失禁相关问题时可能表现出相当的可靠性,尽管两者都缺乏足够的参考文献。然而,DeepSeek在回答质量和可读性方面优于ChatGPT-4.0。虽然两个AI聊天机器人在很大程度上遵循了临床指南,但也观察到偶尔有偏差。在AI聊天机器人在泌尿外科广泛临床应用之前,还需要进一步改进。

相似文献

1
Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions.人工智能之战:针对尿失禁相关问题对DeepSeek和ChatGPT的全面比较分析
Front Public Health. 2025 Jul 23;13:1605908. doi: 10.3389/fpubh.2025.1605908. eCollection 2025.
2
Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.评估DeepResearch和DeepThink在前交叉韧带手术患者教育中的作用:ChatGPT-4o在全面性方面表现出色,DeepSeek R1在骨科信息的清晰度和可读性方面领先。
Knee Surg Sports Traumatol Arthrosc. 2025 Jun 1. doi: 10.1002/ksa.12711.
3
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
4
Chatbots in urology: accuracy, calibration, and comprehensibility; is DeepSeek taking over the throne?泌尿外科中的聊天机器人:准确性、校准和可理解性;DeepSeek 会取而代之吗?
BJU Int. 2025 Jul 31. doi: 10.1111/bju.16873.
5
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。
BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.
6
Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用:与国际共识指南的对比研究
BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.
7
Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.评估ChatGPT不同版本为牙科学生和专业人员提供的有关创伤性牙损伤的信息。
Dent Traumatol. 2025 Aug;41(4):427-436. doi: 10.1111/edt.13042. Epub 2025 Jan 23.
8
Comparative evaluation of ChatGPT and LLaMA for reliability, quality, and accuracy in familial Mediterranean fever.ChatGPT与LLaMA在家族性地中海热的可靠性、质量和准确性方面的比较评估
Eur J Pediatr. 2025 Jul 18;184(8):491. doi: 10.1007/s00431-025-06318-y.
9
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
10
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果:一项观察性研究的内容分析
JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

本文引用的文献

1
Benchmark evaluation of DeepSeek large language models in clinical decision-making.临床决策中DeepSeek大语言模型的基准评估。
Nat Med. 2025 Apr 23. doi: 10.1038/s41591-025-03727-2.
2
Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.评估用于脊柱手术的人工智能生成的患者教育材料:ChatGPT和DeepSeek模型之间可读性和DISCERN质量的比较分析。
Int J Med Inform. 2025 Jun;198:105871. doi: 10.1016/j.ijmedinf.2025.105871. Epub 2025 Mar 13.
3
China made waves with Deepseek, but its real ambition is AI-driven industrial innovation.
中国凭借深势科技引起了轰动,但其真正的雄心是由人工智能驱动的产业创新。
Nature. 2025 Feb;638(8051):609-611. doi: 10.1038/d41586-025-00460-1.
4
A systematic review of AI-based chatbot usages in healthcare services.基于人工智能的聊天机器人在医疗服务中的应用的系统评价。
J Health Organ Manag. 2025 Jan 28. doi: 10.1108/JHOM-12-2023-0376.
5
How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots' Responses About Female Urinary Incontinence.当前的聊天机器人在泌尿外科患者信息方面有多有用?对十种最受欢迎的聊天机器人关于女性尿失禁的回答进行比较。
J Med Syst. 2024 Nov 13;48(1):102. doi: 10.1007/s10916-024-02125-4.
6
Battle of the bots: a comparative analysis of ChatGPT and bing AI for kidney stone-related questions.人机大战:ChatGPT 和必应 AI 针对肾结石相关问题的对比分析。
World J Urol. 2024 Oct 29;42(1):600. doi: 10.1007/s00345-024-05326-1.
7
A toolbox for surfacing health equity harms and biases in large language models.一个用于揭示大语言模型中健康公平性危害和偏见的工具箱。
Nat Med. 2024 Dec;30(12):3590-3600. doi: 10.1038/s41591-024-03258-2. Epub 2024 Sep 23.
8
Closing the gap between open source and commercial large language models for medical evidence summarization.弥合用于医学证据总结的开源大型语言模型与商业大型语言模型之间的差距。
NPJ Digit Med. 2024 Sep 9;7(1):239. doi: 10.1038/s41746-024-01239-w.
9
Patients' experience of incontinence and incontinence-associated dermatitis in hospital settings: a qualitative study.患者在医院环境中对失禁和失禁性皮炎的体验:一项定性研究。
J Wound Care. 2024 Aug 1;33(Sup8a):cxcix-ccvii. doi: 10.12968/jowc.2021.0394.
10
Artificial intelligence in cardiovascular medicine: clinical applications.人工智能在心血管医学中的应用:临床应用。
Eur Heart J. 2024 Oct 21;45(40):4291-4304. doi: 10.1093/eurheartj/ehae465.