深度搜索与ChatGPT：它们在以多种语言回答前列腺癌放射治疗问题方面的性能比较研究。

DeepSeek vs ChatGPT: a comparison study of their performance in answering prostate cancer radiotherapy questions in multiple languages.

作者信息

Luo Peng-Wei, Liu Ji-Wen, Xie Xi, Jiang Jia-Wei, Huo Xin-Yu, Chen Zhen-Lin, Huang Zhang-Cheng, Jiang Shao-Qin, Li Meng-Qiang

机构信息

Department of Urology, Fujian Union Hospital, Fujian Medical University Fuzhou, Fujian, China.

Department of Urology, The First Affiliated Hospital of Chengdu Medical College Chengdu, Sichuan, China.

出版信息

Am J Clin Exp Urol. 2025 Apr 25;13(2):176-185. doi: 10.62347/UIAP7979. eCollection 2025.

DOI:10.62347/UIAP7979

PMID:40400997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12089221/

Abstract

INTRODUCTION

The medical information generated by large language models (LLM) is crucial for improving patient education and clinical decision-making. This study aims to evaluate the performance of two LLMs (DeepSeek and ChatGPT) in answering questions related to prostate cancer radiotherapy in both Chinese and English environments. Through a comparative analysis, we aim to determine which model can provide higher-quality answers in different language environments.

METHODS

A structured evaluation framework was developed using a set of clinically relevant questions covering three key domains: foundational knowledge, patient education, and treatment and follow-up care. Responses from DeepSeek and ChatGPT were generated in both English and Chinese and independently assessed by a panel of five oncology specialists using a five-point Likert scale. Statistical analyses, including the Wilcoxon signed-rank test, were performed to compare the models' performance across different linguistic contexts.

RESULTS

This study ultimately included 33 questions for scoring. In Chinese, DeepSeek outperformed ChatGPT, achieving top ratings (score = 5) in 75.76% vs. 36.36% of responses (P < 0.001), excelling in foundational knowledge (76.92% vs. 38.46%, = 0.047) and treatment/follow-up (81.82% vs. 36.36%, = 0.031). In English, ChatGPT showed comparable performance (66.7% vs. 54.55% top-rated responses, = 0.236), with marginal advantages in treatment/follow-up (63.64% vs. 54.55%, = 0.563). DeepSeek maintained strengths in English foundational knowledge (69.23% vs. 30.77%, = 0.047) and patient education (88.89% vs. 55.56%, = 0.125). These findings underscore DeepSeek's superior Chinese proficiency and language-specific optimization impacts.

CONCLUSIONS

This study shows that DeepSeek performs excellently in providing Chinese medical information, while the two models perform similarly in an English environment. These findings underscore the importance of selecting language-specific artificial intelligence (AI) models to enhance the accuracy and reliability of medical AI applications. While both models show promise in supporting patient education and clinical decision-making, human expert review remains necessary to ensure response accuracy and minimize potential misinformation.

摘要

引言

大语言模型（LLM）生成的医学信息对于改善患者教育和临床决策至关重要。本研究旨在评估两个大语言模型（DeepSeek和ChatGPT）在中文和英文环境下回答与前列腺癌放疗相关问题的表现。通过比较分析，我们旨在确定哪个模型在不同语言环境中能提供更高质量的答案。

方法

使用一组涵盖三个关键领域的临床相关问题开发了一个结构化评估框架，这三个领域为基础知识、患者教育以及治疗和后续护理。DeepSeek和ChatGPT的回答分别以英文和中文生成，并由五名肿瘤学专家组成的小组使用五点李克特量表进行独立评估。进行了包括威尔科克森符号秩检验在内的统计分析，以比较模型在不同语言环境下的表现。

结果

本研究最终纳入33个问题进行评分。在中文环境中，DeepSeek的表现优于ChatGPT，在75.76%的回答中获得最高评分（得分 = 5），而ChatGPT为36.36%（P < 0.001），在基础知识方面表现出色（76.92%对38.46%，P = 0.047）以及治疗/后续护理方面（81.82%对36.36%，P = 0.031）。在英文环境中，ChatGPT表现相当（最高评分回答分别为66.7%和54.55%，P = 0.236），在治疗/后续护理方面有微弱优势（63.64%对54.55%，P = 0.563）。DeepSeek在英文基础知识（69.23%对30.77%，P = 0.047）和患者教育方面（88.89%对55.56%，P = 0.125）保持优势。这些发现凸显了DeepSeek卓越的中文能力以及特定语言优化的影响。

结论

本研究表明，DeepSeek在提供中文医学信息方面表现出色，而两个模型在英文环境中表现相似。这些发现强调了选择特定语言的人工智能（AI）模型以提高医学AI应用的准确性和可靠性的重要性。虽然两个模型在支持患者教育和临床决策方面都显示出前景，但仍需要人类专家审核以确保回答的准确性并尽量减少潜在的错误信息。

相似文献

DeepSeek vs ChatGPT: a comparison study of their performance in answering prostate cancer radiotherapy questions in multiple languages.深度搜索与ChatGPT：它们在以多种语言回答前列腺癌放射治疗问题方面的性能比较研究。

Am J Clin Exp Urol. 2025 Apr 25;13(2):176-185. doi: 10.62347/UIAP7979. eCollection 2025.

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.中国生成式人工智能模型（通义千问和文心一言）在眼科问题查询方面可与ChatGPT-4相媲美，在阿拉伯语和英语方面表现出色。

Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.

Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports.基于临床病例报告，DeepSeek、通义千问2.5 MAX和ChatGPT在角膜眼病、青光眼和神经眼科疾病诊断中的性能表现。

medRxiv. 2025 Mar 17:2025.03.14.25323836. doi: 10.1101/2025.03.14.25323836.

Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology.评估大语言模型（ChatGPT、Claude、DeepSeek、Gemini、Grok和Le Chat）在回答关于血液生理学的项目分析多项选择题时的准确性和可靠性。

Cureus. 2025 Apr 8;17(4):e81871. doi: 10.7759/cureus.81871. eCollection 2025 Apr.

DeepSeek and lacrimal drainage disorders: hype or is it performing better than ChatGPT?DeepSeek与泪道引流障碍：炒作还是它比ChatGPT表现更好？

Orbit. 2025 May 8:1-7. doi: 10.1080/01676830.2025.2501656.

DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier.医疗保健领域的DeepSeek：揭示新开源人工智能前沿的机遇与导向挑战

Cureus. 2025 Feb 18;17(2):e79221. doi: 10.7759/cureus.79221. eCollection 2025 Feb.

A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.DeepSeek-R1模型生成的针对肌肉骨骼放射学问题的回答与ChatGPT-4和ChatGPT-4o的性能比较——一项可行性研究。

Clin Imaging. 2025 Jul;123:110506. doi: 10.1016/j.clinimag.2025.110506. Epub 2025 May 12.

Comparing the performance of ChatGPT and ERNIE Bot in answering questions regarding liver cancer interventional radiology in Chinese and English contexts: A comparative study.比较ChatGPT和文心一言在中英文语境下回答肝癌介入放射学相关问题的性能：一项比较研究。

Digit Health. 2025 Jan 23;11:20552076251315511. doi: 10.1177/20552076251315511. eCollection 2025 Jan-Dec.

Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis.人工智能在回答口腔病理学选择题方面的表现：一项对比分析。

BMC Oral Health. 2025 Apr 15;25(1):573. doi: 10.1186/s12903-025-05926-2.

Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning.DeepSeek大语言模型在医学任务和临床推理方面的比较基准测试。

Nat Med. 2025 Apr 23. doi: 10.1038/s41591-025-03726-3.

引用本文的文献

Capacity of Understanding the Future Approaches in Cancer Treatment by Multiple Models of Artificial Intelligence.通过多种人工智能模型理解癌症治疗未来方法的能力

J Cancer Educ. 2025 Aug 15. doi: 10.1007/s13187-025-02706-y.

From Mutation to Prognosis: AI-HOPE-PI3K Enables Artificial Intelligence Agent-Driven Integration of PI3K Pathway Data in Colorectal Cancer Precision Medicine.从突变到预后：AI-HOPE-PI3K助力人工智能驱动的PI3K通路数据在结直肠癌精准医学中的整合

Int J Mol Sci. 2025 Jul 5;26(13):6487. doi: 10.3390/ijms26136487.

本文引用的文献

Assessments of lung nodules by an artificial intelligence chatbot using longitudinal CT images.使用纵向CT图像通过人工智能聊天机器人对肺结节进行评估。

Cell Rep Med. 2025 Mar 18;6(3):101988. doi: 10.1016/j.xcrm.2025.101988. Epub 2025 Mar 4.

Reflections on DeepSeek's breakthrough.关于DeepSeek突破的思考。

Natl Sci Rev. 2025 Feb 12;12(3):nwaf044. doi: 10.1093/nsr/nwaf044. eCollection 2025 Mar.

Can GPTs Accelerate the Development of Intelligent Diagnosis and Treatment in Traditional Chinese Medicine? A Survey and Empirical Analysis.生成式预训练变换器（GPTs）能否加速中医智能诊疗的发展？一项调查与实证分析。

J Evid Based Med. 2025 Mar;18(1):e70004. doi: 10.1111/jebm.70004.

From GPT to DeepSeek: Significant gaps remain in realizing AI in healthcare.从GPT到DeepSeek：在医疗保健领域实现人工智能仍存在重大差距。

J Biomed Inform. 2025 Mar;163:104791. doi: 10.1016/j.jbi.2025.104791. Epub 2025 Feb 10.

DeepSeek versus ChatGPT: Multimodal artificial intelligence revolutionizing scientific discovery. From language editing to autonomous content generation-Redefining innovation in research and practice.深度求索与ChatGPT：多模态人工智能正在革新科学发现。从语言编辑到自主内容生成——重新定义研究与实践中的创新。

Knee Surg Sports Traumatol Arthrosc. 2025 May;33(5):1553-1556. doi: 10.1002/ksa.12628. Epub 2025 Feb 12.

How China created AI model DeepSeek and shocked the world.中国如何创建人工智能模型“深寻”并震惊世界。

Nature. 2025 Feb;638(8050):300-301. doi: 10.1038/d41586-025-00259-0.

Digit Health. 2025 Jan 23;11:20552076251315511. doi: 10.1177/20552076251315511. eCollection 2025 Jan-Dec.

China's cheap, open AI model DeepSeek thrills scientists.中国廉价且开放的人工智能模型“百川”令科学家们兴奋不已。

Nature. 2025 Feb;638(8049):13-14. doi: 10.1038/d41586-025-00229-6.

Chinese firm's large language model makes a splash.中国公司的大语言模型引起轰动。

Science. 2025 Jan 17;387(6731):238. doi: 10.1126/science.adv9836. Epub 2025 Jan 16.

Evaluating ChatGPT as an educational resource for patients with multiple myeloma: A preliminary investigation.评估 ChatGPT 作为多发性骨髓瘤患者的教育资源：初步研究。

Am J Hematol. 2024 Jun;99(6):1205-1207. doi: 10.1002/ajh.27318. Epub 2024 Apr 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验