• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

中国生成式人工智能模型(通义千问和文心一言)在眼科问题查询方面可与ChatGPT-4相媲美,在阿拉伯语和英语方面表现出色。

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.

作者信息

Sallam Malik, Alasfoor Israa M, Khalid Shahad W, Al-Mulla Rand I, Al-Farajat Amwaj, Mijwil Maad M, Zahrawi Reem, Sallam Mohammed, Egger Jan, Al-Adwan Ahmad S

机构信息

Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan.

Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan.

出版信息

Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.

DOI:10.52225/narra.v5i1.2371
PMID:40352182
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12059827/
Abstract

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI's ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-40 and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-40 were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43 ± 0.28), outperforming both DeepSeek-R1 (4.3 ± 0.43) and ChatGPT-40 (4.14 ± 0.41), with  = 0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40 ± 0.29), followed by DeepSeek-R1 (4.20 ± 0.49) and ChatGPT-40 (4.14 ± 0.41), with  = 0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-40 demonstrating the most balanced linguistic capabilities ( = 0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-40 in ophthalmology-related queries.

摘要

生成式人工智能(genAI)的快速发展开启了数字医疗咨询的新时代,患者开始借助人工智能驱动的工具获取指导。像DeepSeek-R1和Qwen-2.5等中国研发的genAI模型的出现,对OpenAI的ChatGPT的主导地位构成了挑战。本研究的目的是将中国genAI模型的性能与ChatGPT-40进行基准测试,并评估英语和阿拉伯语在性能上的差异。按照genAI评估的METRICS清单,使用CLEAR工具在常见的患者眼科问题中对Qwen-2.5、DeepSeek-R1和ChatGPT-40进行完整性、准确性和相关性评估。在英语方面,Qwen-2.5展现出最高的总体性能(CLEAR分数:4.43±0.28),优于DeepSeek-R1(4.3±0.43)和ChatGPT-40(4.14±0.41),P = 0.002。在阿拉伯语中也出现了类似的排名,Qwen-2.5再次领先(4.40±0.29),其次是DeepSeek-R1(4.20±0.49)和ChatGPT-40(4.14±0.41),P = 0.007。每个测试的genAI模型在两种语言中的表现几乎相同,ChatGPT-40展现出最平衡的语言能力(P = 0.957),而Qwen-2.5和DeepSeek-R1在英语方面表现出微弱优势。对genAI在关键CLEAR组件上的性能进行深入研究发现,Qwen-2.5在英语和阿拉伯语的内容完整性、事实准确性和相关性方面始终表现出色,为医疗咨询中的genAI设定了新的基准。尽管存在微小的语言差异,但所有三个模型都展现出强大的多语言能力,挑战了长期以来认为genAI天生偏向英语的假设。这些发现凸显了人工智能驱动的医疗辅助的不断发展,中国genAI模型在眼科相关问题上能够与ChatGPT-40竞争甚至超越它。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3382/12059827/c328aff28d20/NarraJ-5-e2371-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3382/12059827/28b9c8c13843/NarraJ-5-e2371-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3382/12059827/c328aff28d20/NarraJ-5-e2371-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3382/12059827/28b9c8c13843/NarraJ-5-e2371-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3382/12059827/c328aff28d20/NarraJ-5-e2371-g002.jpg

相似文献

1
Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.中国生成式人工智能模型(通义千问和文心一言)在眼科问题查询方面可与ChatGPT-4相媲美,在阿拉伯语和英语方面表现出色。
Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.
2
Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports.基于临床病例报告,DeepSeek、通义千问2.5 MAX和ChatGPT在角膜眼病、青光眼和神经眼科疾病诊断中的性能表现。
medRxiv. 2025 Mar 17:2025.03.14.25323836. doi: 10.1101/2025.03.14.25323836.
3
Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.生成式人工智能模型在性能方面的语言差异:对英文和阿拉伯文传染病查询的考察。
BMC Infect Dis. 2024 Aug 8;24(1):799. doi: 10.1186/s12879-024-09725-y.
4
DeepSeek vs ChatGPT: a comparison study of their performance in answering prostate cancer radiotherapy questions in multiple languages.深度搜索与ChatGPT:它们在以多种语言回答前列腺癌放射治疗问题方面的性能比较研究。
Am J Clin Exp Urol. 2025 Apr 25;13(2):176-185. doi: 10.62347/UIAP7979. eCollection 2025.
5
Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.评估用于脊柱手术的人工智能生成的患者教育材料:ChatGPT和DeepSeek模型之间可读性和DISCERN质量的比较分析。
Int J Med Inform. 2025 Jun;198:105871. doi: 10.1016/j.ijmedinf.2025.105871. Epub 2025 Mar 13.
6
Evaluating advanced AI reasoning models: ChatGPT-4.0 and DeepSeek-R1 diagnostic performance in otolaryngology: a comparative analysis.评估先进的人工智能推理模型:ChatGPT-4.0和DeepSeek-R1在耳鼻喉科的诊断性能:一项对比分析。
Am J Otolaryngol. 2025 May 10;46(4):104667. doi: 10.1016/j.amjoto.2025.104667.
7
DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier.医疗保健领域的DeepSeek:揭示新开源人工智能前沿的机遇与导向挑战
Cureus. 2025 Feb 18;17(2):e79221. doi: 10.7759/cureus.79221. eCollection 2025 Feb.
8
A comparison of performance of DeepSeek-R1 model-generated responses to musculoskeletal radiology queries against ChatGPT-4 and ChatGPT-4o - A feasibility study.DeepSeek-R1模型生成的针对肌肉骨骼放射学问题的回答与ChatGPT-4和ChatGPT-4o的性能比较——一项可行性研究。
Clin Imaging. 2025 Jul;123:110506. doi: 10.1016/j.clinimag.2025.110506. Epub 2025 May 12.
9
Can deepseek and ChatGPT be used in the diagnosis of oral pathologies?DeepSeek和ChatGPT能用于口腔病理学诊断吗?
BMC Oral Health. 2025 Apr 25;25(1):638. doi: 10.1186/s12903-025-06034-x.
10
User Intent to Use DeepSeek for Health Care Purposes and Their Trust in the Large Language Model: Multinational Survey Study.用户将DeepSeek用于医疗保健目的的意图及其对大语言模型的信任:跨国调查研究
JMIR Hum Factors. 2025 May 26;12:e72867. doi: 10.2196/72867.

本文引用的文献

1
Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology.医疗保健教育中的人工智能:评估ChatGPT、Copilot和谷歌Gemini在心血管药理学方面的准确性。
Front Med (Lausanne). 2025 Feb 19;12:1495378. doi: 10.3389/fmed.2025.1495378. eCollection 2025.
2
Artificial Intelligence Applications in Ophthalmology.人工智能在眼科中的应用。
JMA J. 2025 Jan 15;8(1):66-75. doi: 10.31662/jmaj.2024-0139. Epub 2024 Sep 13.
3
Artificial intelligence with ChatGPT 4: a large language model in support of ocular oncology cases.
配备ChatGPT 4的人工智能:一种支持眼部肿瘤病例的大语言模型。
Int Ophthalmol. 2025 Feb 7;45(1):59. doi: 10.1007/s10792-024-03399-w.
4
Scientists flock to DeepSeek: how they're using the blockbuster AI model.科学家们纷纷涌向深度搜索:他们如何使用这个重磅人工智能模型。
Nature. 2025 Jan 29. doi: 10.1038/d41586-025-00275-0.
5
Chinese firm's large language model makes a splash.中国公司的大语言模型引起轰动。
Science. 2025 Jan 17;387(6731):238. doi: 10.1126/science.adv9836. Epub 2025 Jan 16.
6
Artificial Doctors: Performance of Chatbots as a Tool for Patient Education on Keratoconus.人工智能医生:聊天机器人作为圆锥角膜患者教育工具的表现
Eye Contact Lens. 2025 Mar 1;51(3):e112-e116. doi: 10.1097/ICL.0000000000001160. Epub 2024 Dec 31.
7
Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.眼科领域中聊天机器人的机遇与挑战:一篇叙述性综述
J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.
8
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.甲状腺眼病与人工智能:ChatGPT-3.5、ChatGPT-4o和Gemini在患者信息传递方面的比较研究
Ophthalmic Plast Reconstr Surg. 2024 Dec 24. doi: 10.1097/IOP.0000000000002882.
9
Why we need to be careful with LLMs in medicine.为什么我们在医学领域使用大语言模型时需要谨慎。
Front Med (Lausanne). 2024 Dec 4;11:1495582. doi: 10.3389/fmed.2024.1495582. eCollection 2024.
10
Ethical and Bias Considerations in Artificial Intelligence/Machine Learning.人工智能/机器学习中的伦理与偏见考量
Mod Pathol. 2025 Mar;38(3):100686. doi: 10.1016/j.modpat.2024.100686. Epub 2024 Dec 16.