• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OpenAI ChatGPT-4 和 Google Gemini 在病毒学选择题中的表现:英语和阿拉伯语回答的比较分析。

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.

机构信息

Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.

Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan.

出版信息

BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.

DOI:10.1186/s13104-024-06920-7
PMID:39228001
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11373487/
Abstract

OBJECTIVE

The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.

RESULTS

ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.

摘要

目的

人工智能(AI)在医疗保健教育中的融合是不可避免的。了解生成式 AI 在不同语言中回答复杂问题的熟练程度对于教育目的至关重要。本研究的目的是比较 ChatGPT-4 和 Gemini 在回答英语和阿拉伯语病毒学多项选择题(MCQs)方面的表现,同时评估生成内容的质量。根据专为评估 AI 生成内容而设计的 CLEAR 工具,评估了这两种 AI 模型对 40 个病毒学 MCQs 的回答的正确性和质量。根据修订后的布鲁姆分类法,将 MCQs 分为较低和较高认知类别。该研究设计考虑了 METRICS 清单,用于设计和报告医疗保健中基于生成式 AI 的研究。

结果

ChatGPT-4 在英语方面的表现优于阿拉伯语,ChatGPT-4 在正确性和 CLEAR 评分方面始终超过 Gemini。与阿拉伯语的 65%相比,ChatGPT-4 在英语中的正确率为 80%,而 Gemini 为 62.5%。对于这两种 AI 模型,报告称在较低认知领域的表现较好。ChatGPT-4 和 Gemini 都具有教育应用的潜力;然而,它们在不同语言中的表现存在差异,突出了持续开发的重要性,以确保 AI 在全球医疗保健教育中的有效融合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a97/11373487/1b8ddaa703b4/13104_2024_6920_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a97/11373487/1b8ddaa703b4/13104_2024_6920_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a97/11373487/1b8ddaa703b4/13104_2024_6920_Figa_HTML.jpg

相似文献

1
The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.OpenAI ChatGPT-4 和 Google Gemini 在病毒学选择题中的表现:英语和阿拉伯语回答的比较分析。
BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.
2
Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.生成式人工智能模型在性能方面的语言差异:对英文和阿拉伯文传染病查询的考察。
BMC Infect Dis. 2024 Aug 8;24(1):799. doi: 10.1186/s12879-024-09725-y.
3
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.人类与人工智能:ChatGPT-4在临床化学选择题方面表现优于必应、巴德、ChatGPT-3.5和人类。
Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.
4
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
5
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
6
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
7
Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.ChatGPT 3.5、ChatGPT 4.0和谷歌Gemini等人工智能模型在神经放射学诊断中的比较评估
Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.
8
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
9
Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.ChatGPT 在中美护理执照考试中的表现:横断面研究。
JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.
10
Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。
Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.

引用本文的文献

1
Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.中国生成式人工智能模型(通义千问和文心一言)在眼科问题查询方面可与ChatGPT-4相媲美,在阿拉伯语和英语方面表现出色。
Narra J. 2025 Apr;5(1):e2371. doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.
2
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
3
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.

本文引用的文献

1
Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions.ChatGPT-3.5在波兰医学期末考试中的表现评估研究:回答980个问题的准确性
Healthcare (Basel). 2024 Aug 16;12(16):1637. doi: 10.3390/healthcare12161637.
2
The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases.人工智能在常见肌肉骨骼疾病一级预防中的作用
Cureus. 2024 Jul 25;16(7):e65372. doi: 10.7759/cureus.65372. eCollection 2024 Jul.
3
Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic.
GPT-4o在九种语言和4967个病例的罕见病诊断中表现一致。
medRxiv. 2025 Feb 28:2025.02.26.25322769. doi: 10.1101/2025.02.26.25322769.
生成式人工智能模型在性能方面的语言差异:对英文和阿拉伯文传染病查询的考察。
BMC Infect Dis. 2024 Aug 8;24(1):799. doi: 10.1186/s12879-024-09725-y.
4
Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-3.5 and GoogleBard in Identifying Red Flags of Low Back Pain.人工智能平台的比较分析:ChatGPT-3.5和GoogleBard在识别腰痛警示信号方面的应用
Cureus. 2024 Jul 1;16(7):e63580. doi: 10.7759/cureus.63580. eCollection 2024 Jul.
5
Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。
Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.
6
Generative AI in healthcare: an implementation science informed translational path on application, integration and governance.生成式人工智能在医疗保健领域的应用、整合和治理:基于实施科学的转化途径。
Implement Sci. 2024 Mar 15;19(1):27. doi: 10.1186/s13012-024-01357-9.
7
ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations.ChatGPT在医学、牙科、药学和公共卫生教育中的应用:一项突出优势与局限的描述性研究。
Narra J. 2023 Apr;3(1):e103. doi: 10.52225/narra.v3i1.103. Epub 2023 Mar 29.
8
A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review.一份用于规范基于生成式人工智能模型的医疗保健教育与实践研究设计和报告的初步清单(METRICS):涉及文献综述的开发研究
Interact J Med Res. 2024 Feb 15;13:e54704. doi: 10.2196/54704.
9
Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study.揭示 ChatGPT 在视网膜血管疾病分类上的语言差异:一项横断面研究。
J Med Internet Res. 2024 Jan 22;26:e51926. doi: 10.2196/51926.
10
Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients.人工智能 ChatGPT 在为患者提供眼科疾病信息和管理方面的可靠性和准确性。
Eye (Lond). 2024 May;38(7):1368-1373. doi: 10.1038/s41433-023-02906-0. Epub 2024 Jan 20.