Liu Yaxin, Yu Fangfei, Zhang Xiaofei, Tong Xiaohan, Li Kui, Gu Weikuan, Yu Baiquan
Department of Respiratory and Critical Care Medicine, Second Affiliated Hospital of Harbin Medical University, 157 Baojian Road, Nangang District, Harbin, 150081, China, +86 138 3612 4743.
Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, TN, United States.
JMIR Med Inform. 2025 Aug 13;13:e65365. doi: 10.2196/65365.
BACKGROUND: Asthma is a chronic inflammatory airway disease requiring long-term management. Artificial intelligence (AI)-driven tools such as large language models (LLMs) hold potential for enhancing patient education, especially for multilingual populations. However, comparative assessments of LLMs in disease-specific, bilingual health communication are limited. OBJECTIVE: This study aimed to evaluate and compare the performance of two advanced LLMs-ChatGPT-4o (OpenAI) and DeepSeek-v3 (DeepSeek AI)-in providing bilingual (English and Chinese) education for patients with asthma, focusing on accuracy, completeness, clinical relevance, and language adaptability. METHODS: A total of 53 asthma-related questions were collected from real patient inquiries across 8 clinical domains. Each question was posed in both English and Chinese to ChatGPT-4o and DeepSeek-v3. Responses were evaluated using a 7D clinical quality framework (eg, completeness, consensus consistency, and reasoning ability) adapted from Google Health. Three respiratory clinicians performed blinded scoring evaluations. Descriptive statistics and Wilcoxon signed-rank tests were applied to compare performance across domains and against theoretical maximums. RESULTS: Both models demonstrated high overall quality in generating bilingual educational content. DeepSeek-v3 outperformed ChatGPT-4o in completeness and currency, particularly in treatment-related knowledge and symptom interpretation. ChatGPT-4o showed advantages in clarity and accessibility. In English responses, ChatGPT achieved perfect scores across 5 domains, but scored lower in clinical features (mean 3.78, SD 0.16; P=.02), treatment (mean 3.90, SD 0.05; P=.03), and differential diagnosis (mean 3.83, SD 0.29; P=.08). CONCLUSIONS: ChatGPT-4o and DeepSeek-v3 each offer distinct strengths for bilingual asthma education. While ChatGPT is more suitable for general health education due to its expressive clarity, DeepSeek provides more up-to-date and comprehensive clinical content. Both models can serve as effective supplementary tools for patient self-management but cannot replace professional medical advice. Future AI health care systems should enhance clinical reasoning, ensure guideline currency, and integrate human oversight to optimize safety and accuracy.
背景:哮喘是一种需要长期管理的慢性炎症性气道疾病。诸如大语言模型(LLMs)等人工智能(AI)驱动的工具在加强患者教育方面具有潜力,特别是对于多语言人群。然而,在特定疾病的双语健康交流中对大语言模型的比较评估有限。 目的:本研究旨在评估和比较两种先进的大语言模型——ChatGPT-4o(OpenAI)和DeepSeek-v3(DeepSeek AI)——在为哮喘患者提供双语(英语和中文)教育方面的表现,重点关注准确性、完整性、临床相关性和语言适应性。 方法:从8个临床领域的实际患者咨询中收集了总共53个与哮喘相关的问题。每个问题都以英文和中文向ChatGPT-4o和DeepSeek-v3提出。使用从谷歌健康改编的7D临床质量框架(例如,完整性、共识一致性和推理能力)对回答进行评估。三名呼吸科临床医生进行了盲法评分评估。应用描述性统计和Wilcoxon符号秩检验来比较各领域的表现以及与理论最大值的对比。 结果:两种模型在生成双语教育内容方面都表现出较高的整体质量。DeepSeek-v3在完整性和时效性方面优于ChatGPT-4o,特别是在治疗相关知识和症状解释方面。ChatGPT-4o在清晰度和易理解性方面具有优势。在英文回答中,ChatGPT在5个领域获得了满分,但在临床特征(平均3.78,标准差0.16;P = 0.02)、治疗(平均3.90,标准差0.05;P = 0.03)和鉴别诊断(平均3.83,标准差0.29;P = 0.08)方面得分较低。 结论:ChatGPT-4o和DeepSeek-v3在双语哮喘教育方面各有优势。由于其表达清晰,ChatGPT更适合一般健康教育,而DeepSeek提供了更最新和全面的临床内容。两种模型都可以作为患者自我管理的有效辅助工具,但不能取代专业医疗建议。未来的人工智能医疗保健系统应加强临床推理,确保指南时效性,并整合人工监督以优化安全性和准确性。
Aesthetic Plast Surg. 2025-8-11
Front Artif Intell. 2025-1-30
JMIR Form Res. 2025-1-16
iScience. 2024-4-23
JMIR Med Inform. 2024-5-10