Suppr超能文献

深度搜索与ChatGPT:它们在以多种语言回答前列腺癌放射治疗问题方面的性能比较研究。

DeepSeek vs ChatGPT: a comparison study of their performance in answering prostate cancer radiotherapy questions in multiple languages.

作者信息

Luo Peng-Wei, Liu Ji-Wen, Xie Xi, Jiang Jia-Wei, Huo Xin-Yu, Chen Zhen-Lin, Huang Zhang-Cheng, Jiang Shao-Qin, Li Meng-Qiang

机构信息

Department of Urology, Fujian Union Hospital, Fujian Medical University Fuzhou, Fujian, China.

Department of Urology, The First Affiliated Hospital of Chengdu Medical College Chengdu, Sichuan, China.

出版信息

Am J Clin Exp Urol. 2025 Apr 25;13(2):176-185. doi: 10.62347/UIAP7979. eCollection 2025.

Abstract

INTRODUCTION

The medical information generated by large language models (LLM) is crucial for improving patient education and clinical decision-making. This study aims to evaluate the performance of two LLMs (DeepSeek and ChatGPT) in answering questions related to prostate cancer radiotherapy in both Chinese and English environments. Through a comparative analysis, we aim to determine which model can provide higher-quality answers in different language environments.

METHODS

A structured evaluation framework was developed using a set of clinically relevant questions covering three key domains: foundational knowledge, patient education, and treatment and follow-up care. Responses from DeepSeek and ChatGPT were generated in both English and Chinese and independently assessed by a panel of five oncology specialists using a five-point Likert scale. Statistical analyses, including the Wilcoxon signed-rank test, were performed to compare the models' performance across different linguistic contexts.

RESULTS

This study ultimately included 33 questions for scoring. In Chinese, DeepSeek outperformed ChatGPT, achieving top ratings (score = 5) in 75.76% vs. 36.36% of responses (P < 0.001), excelling in foundational knowledge (76.92% vs. 38.46%, = 0.047) and treatment/follow-up (81.82% vs. 36.36%, = 0.031). In English, ChatGPT showed comparable performance (66.7% vs. 54.55% top-rated responses, = 0.236), with marginal advantages in treatment/follow-up (63.64% vs. 54.55%, = 0.563). DeepSeek maintained strengths in English foundational knowledge (69.23% vs. 30.77%, = 0.047) and patient education (88.89% vs. 55.56%, = 0.125). These findings underscore DeepSeek's superior Chinese proficiency and language-specific optimization impacts.

CONCLUSIONS

This study shows that DeepSeek performs excellently in providing Chinese medical information, while the two models perform similarly in an English environment. These findings underscore the importance of selecting language-specific artificial intelligence (AI) models to enhance the accuracy and reliability of medical AI applications. While both models show promise in supporting patient education and clinical decision-making, human expert review remains necessary to ensure response accuracy and minimize potential misinformation.

摘要

引言

大语言模型(LLM)生成的医学信息对于改善患者教育和临床决策至关重要。本研究旨在评估两个大语言模型(DeepSeek和ChatGPT)在中文和英文环境下回答与前列腺癌放疗相关问题的表现。通过比较分析,我们旨在确定哪个模型在不同语言环境中能提供更高质量的答案。

方法

使用一组涵盖三个关键领域的临床相关问题开发了一个结构化评估框架,这三个领域为基础知识、患者教育以及治疗和后续护理。DeepSeek和ChatGPT的回答分别以英文和中文生成,并由五名肿瘤学专家组成的小组使用五点李克特量表进行独立评估。进行了包括威尔科克森符号秩检验在内的统计分析,以比较模型在不同语言环境下的表现。

结果

本研究最终纳入33个问题进行评分。在中文环境中,DeepSeek的表现优于ChatGPT,在75.76%的回答中获得最高评分(得分 = 5),而ChatGPT为36.36%(P < 0.001),在基础知识方面表现出色(76.92%对38.46%,P = 0.047)以及治疗/后续护理方面(81.82%对36.36%,P = 0.031)。在英文环境中,ChatGPT表现相当(最高评分回答分别为66.7%和54.55%,P = 0.236),在治疗/后续护理方面有微弱优势(63.64%对54.55%,P = 0.563)。DeepSeek在英文基础知识(69.23%对30.77%,P = 0.047)和患者教育方面(88.89%对55.56%,P = 0.125)保持优势。这些发现凸显了DeepSeek卓越的中文能力以及特定语言优化的影响。

结论

本研究表明,DeepSeek在提供中文医学信息方面表现出色,而两个模型在英文环境中表现相似。这些发现强调了选择特定语言的人工智能(AI)模型以提高医学AI应用的准确性和可靠性的重要性。虽然两个模型在支持患者教育和临床决策方面都显示出前景,但仍需要人类专家审核以确保回答的准确性并尽量减少潜在的错误信息。

相似文献

本文引用的文献

2
Reflections on DeepSeek's breakthrough.关于DeepSeek突破的思考。
Natl Sci Rev. 2025 Feb 12;12(3):nwaf044. doi: 10.1093/nsr/nwaf044. eCollection 2025 Mar.
9
Chinese firm's large language model makes a splash.中国公司的大语言模型引起轰动。
Science. 2025 Jan 17;387(6731):238. doi: 10.1126/science.adv9836. Epub 2025 Jan 16.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验