文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估ChatGPT和DeepSeek之间的大型语言模型在双语个体哮喘教育中的作用:比较研究

Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.

作者信息

Liu Yaxin, Yu Fangfei, Zhang Xiaofei, Tong Xiaohan, Li Kui, Gu Weikuan, Yu Baiquan

机构信息

Department of Respiratory and Critical Care Medicine, Second Affiliated Hospital of Harbin Medical University, 157 Baojian Road, Nangang District, Harbin, 150081, China, +86 138 3612 4743.

Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, TN, United States.

出版信息

JMIR Med Inform. 2025 Aug 13;13:e65365. doi: 10.2196/65365.


DOI:10.2196/65365
PMID:40802989
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12349887/
Abstract

BACKGROUND: Asthma is a chronic inflammatory airway disease requiring long-term management. Artificial intelligence (AI)-driven tools such as large language models (LLMs) hold potential for enhancing patient education, especially for multilingual populations. However, comparative assessments of LLMs in disease-specific, bilingual health communication are limited. OBJECTIVE: This study aimed to evaluate and compare the performance of two advanced LLMs-ChatGPT-4o (OpenAI) and DeepSeek-v3 (DeepSeek AI)-in providing bilingual (English and Chinese) education for patients with asthma, focusing on accuracy, completeness, clinical relevance, and language adaptability. METHODS: A total of 53 asthma-related questions were collected from real patient inquiries across 8 clinical domains. Each question was posed in both English and Chinese to ChatGPT-4o and DeepSeek-v3. Responses were evaluated using a 7D clinical quality framework (eg, completeness, consensus consistency, and reasoning ability) adapted from Google Health. Three respiratory clinicians performed blinded scoring evaluations. Descriptive statistics and Wilcoxon signed-rank tests were applied to compare performance across domains and against theoretical maximums. RESULTS: Both models demonstrated high overall quality in generating bilingual educational content. DeepSeek-v3 outperformed ChatGPT-4o in completeness and currency, particularly in treatment-related knowledge and symptom interpretation. ChatGPT-4o showed advantages in clarity and accessibility. In English responses, ChatGPT achieved perfect scores across 5 domains, but scored lower in clinical features (mean 3.78, SD 0.16; P=.02), treatment (mean 3.90, SD 0.05; P=.03), and differential diagnosis (mean 3.83, SD 0.29; P=.08). CONCLUSIONS: ChatGPT-4o and DeepSeek-v3 each offer distinct strengths for bilingual asthma education. While ChatGPT is more suitable for general health education due to its expressive clarity, DeepSeek provides more up-to-date and comprehensive clinical content. Both models can serve as effective supplementary tools for patient self-management but cannot replace professional medical advice. Future AI health care systems should enhance clinical reasoning, ensure guideline currency, and integrate human oversight to optimize safety and accuracy.

摘要

背景:哮喘是一种需要长期管理的慢性炎症性气道疾病。诸如大语言模型(LLMs)等人工智能(AI)驱动的工具在加强患者教育方面具有潜力,特别是对于多语言人群。然而,在特定疾病的双语健康交流中对大语言模型的比较评估有限。 目的:本研究旨在评估和比较两种先进的大语言模型——ChatGPT-4o(OpenAI)和DeepSeek-v3(DeepSeek AI)——在为哮喘患者提供双语(英语和中文)教育方面的表现,重点关注准确性、完整性、临床相关性和语言适应性。 方法:从8个临床领域的实际患者咨询中收集了总共53个与哮喘相关的问题。每个问题都以英文和中文向ChatGPT-4o和DeepSeek-v3提出。使用从谷歌健康改编的7D临床质量框架(例如,完整性、共识一致性和推理能力)对回答进行评估。三名呼吸科临床医生进行了盲法评分评估。应用描述性统计和Wilcoxon符号秩检验来比较各领域的表现以及与理论最大值的对比。 结果:两种模型在生成双语教育内容方面都表现出较高的整体质量。DeepSeek-v3在完整性和时效性方面优于ChatGPT-4o,特别是在治疗相关知识和症状解释方面。ChatGPT-4o在清晰度和易理解性方面具有优势。在英文回答中,ChatGPT在5个领域获得了满分,但在临床特征(平均3.78,标准差0.16;P = 0.02)、治疗(平均3.90,标准差0.05;P = 0.03)和鉴别诊断(平均3.83,标准差0.29;P = 0.08)方面得分较低。 结论:ChatGPT-4o和DeepSeek-v3在双语哮喘教育方面各有优势。由于其表达清晰,ChatGPT更适合一般健康教育,而DeepSeek提供了更最新和全面的临床内容。两种模型都可以作为患者自我管理的有效辅助工具,但不能取代专业医疗建议。未来的人工智能医疗保健系统应加强临床推理,确保指南时效性,并整合人工监督以优化安全性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/5c8356538620/medinform-v13-e65365-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/99b600d5439c/medinform-v13-e65365-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/761c112d16c9/medinform-v13-e65365-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/5c8356538620/medinform-v13-e65365-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/99b600d5439c/medinform-v13-e65365-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/761c112d16c9/medinform-v13-e65365-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c97f/12349887/5c8356538620/medinform-v13-e65365-g003.jpg

相似文献

[1]
Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.

JMIR Med Inform. 2025-8-13

[2]
Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.

BMC Neurol. 2025-7-1

[3]
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025-8-13

[4]
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.

J Med Internet Res. 2025-6-18

[5]
A Comparative Study on the Use of DeepSeek-R1 and ChatGPT-4.5 in Different Aspects of Plastic Surgery.

Aesthetic Plast Surg. 2025-8-11

[6]
ChatGPT-4.0 or DeepSeek-V3? Comparative analysis of answers to the most frequently asked questions by total knee replacement candidate patients.

Medicine (Baltimore). 2025-8-22

[7]
Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.

Knee Surg Sports Traumatol Arthrosc. 2025-6-1

[8]
Performance of ChatGPT and DeepSeek in the Management of Postprostatectomy Uri-nary Incontinence.

Int Braz J Urol. 2025

[9]
Diagnostic performance of newly developed large language models in critical illness cases: A comparative study.

Int J Med Inform. 2025-12

[10]
Clinical feasibility of AI Doctors: Evaluating the replacement potential of large language models in outpatient settings for central nervous system tumors.

Int J Med Inform. 2025-6-12

本文引用的文献

[1]
DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier.

Cureus. 2025-2-18

[2]
Benefits, limits, and risks of ChatGPT in medicine.

Front Artif Intell. 2025-1-30

[3]
Assessing the Current Limitations of Large Language Models in Advancing Health Care Education.

JMIR Form Res. 2025-1-16

[4]
ChatGPT in medicine: A cross-disciplinary systematic review of ChatGPT's (artificial intelligence) role in research, clinical practice, education, and patient interaction.

Medicine (Baltimore). 2024-8-9

[5]
Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese.

JMIR Med Inform. 2024-8-8

[6]
The application of large language models in medicine: A scoping review.

iScience. 2024-4-23

[7]
Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications.

Int J Med Inform. 2024-8

[8]
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024-5-10

[9]
Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.

NPJ Digit Med. 2024-4-23

[10]
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024-2-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索