• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.人工智能工具在回答急诊医学题库问题方面的性能比较:ChatGPT 4.0、谷歌Gemini和微软Copilot
Pak J Med Sci. 2025 Apr;41(4):968-972. doi: 10.12669/pjms.41.4.11178.
2
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
3
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
4
Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量:一项观察性横断面研究。
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
5
Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience.克劳德、ChatGPT、Copilot和Gemini在神经科学不同主题上与学生的表现对比。
Adv Physiol Educ. 2025 Jun 1;49(2):430-437. doi: 10.1152/advan.00093.2024. Epub 2025 Jan 17.
6
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
7
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
8
Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?人工智能语言模型能否有效解决牙齿创伤问题?
Dent Traumatol. 2025 Apr 1. doi: 10.1111/edt.13063.
9
A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study.ChatGPT标准版本与高级版本、谷歌Gemini和微软Copilot上前列腺癌筛查信息质量的比较:一项横断面研究。
Am J Health Promot. 2025 Jun;39(5):766-776. doi: 10.1177/08901171251316371. Epub 2025 Jan 24.
10
Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问?
BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

引用本文的文献

1
Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?探索DeepSeek-R1、ChatGPT-4和谷歌Gemini在医学教育中的作用:它们的有效性和可靠性如何?
Pak J Med Sci. 2025 Jul;41(7):1887-1892. doi: 10.12669/pjms.41.7.12183.

本文引用的文献

1
Comparing answers of artificial intelligence systems and clinical toxicologists to questions about poisoning: Can their answers be distinguished?比较人工智能系统和临床毒理学家对中毒问题的回答:能否区分他们的答案?
Emergencias. 2024 Oct;36(5):351-358. doi: 10.55633/s3me/082.2024.
2
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
3
Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation.评估ChatGPT 3.5和Microsoft Copilot在耳鼻喉科的表现:与耳鼻喉科医生评估的对比研究
Indian J Otolaryngol Head Neck Surg. 2024 Aug;76(4):3465-3469. doi: 10.1007/s12070-024-04729-1. Epub 2024 May 1.
4
The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性:与心脏病专家和急诊医学专家的比较。
Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.
5
Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.评估生成式预训练转换器(GPT)在临床决策中的应用:GPT-3.5 和 GPT-4 的对比分析。
J Med Internet Res. 2024 Jun 27;26:e54571. doi: 10.2196/54571.
6
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性:一项横断面研究。
BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.
7
A Comparative Analysis of ChatGPT, ChatGPT-4, and Google Bard Performances at the Advanced Burn Life Support Exam.ChatGPT、ChatGPT-4 和 Google Bard 在高级烧伤生命支持考试中的表现比较分析。
J Burn Care Res. 2024 Aug 6;45(4):945-948. doi: 10.1093/jbcr/irae044.
8
Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。
Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.
9
Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.ChatGPT 3.5 Copilot 和 Gemini 解读生化实验室数据的反应准确性:一项初步研究。
Sci Rep. 2024 Apr 8;14(1):8233. doi: 10.1038/s41598-024-58964-1.
10
Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.比较急诊医学专家、心脏病专家和 Chat-GPT 在心电图评估中的表现。
Am J Emerg Med. 2024 Jun;80:51-60. doi: 10.1016/j.ajem.2024.03.017. Epub 2024 Mar 15.

人工智能工具在回答急诊医学题库问题方面的性能比较:ChatGPT 4.0、谷歌Gemini和微软Copilot

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.

作者信息

Aksoy Iskender, Arslan Merve Kara

机构信息

Iskender Aksoy Department of Emergency Medicine, Faculty of Medicine, Giresun University, 28100, Giresun, Turkey.

Merve Kara Arslan Department of Emergency Clinic, Bulancak State Hospital, 28300, Bulancak, Giresun, Turkey.

出版信息

Pak J Med Sci. 2025 Apr;41(4):968-972. doi: 10.12669/pjms.41.4.11178.

DOI:10.12669/pjms.41.4.11178
PMID:40290213
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12022595/
Abstract

OBJECTIVE

Using artificial intelligence tools that work with different software architectures for both clinical and educational purposes in the medical field has been a subject of considerable interest recently. In this study, we compared the answers given by three different artificial intelligence chatbots to the Emergency Medicine question pool obtained from the questions asked in the Turkish National Medical Specialization Exam. We tried to investigate the effects on the answers given by classifying the questions in terms of content and form and examining the question sentences.

METHODS

The questions related to emergency medicine of the Medical Specialization Exam questions between 2015-2020 were recorded. The questions were asked to artificial intelligence models, including ChatGPT-4, Gemini, and Copilot. The length of the questions, the question type and the topics of the wrong answers were recorded.

RESULTS

The most successful chatbot in terms of total score was Microsoft Copilot (7.8% error margin), while the least successful was Google Gemini (22.9% error margin) (p<0.001). It was important that all chatbots had the highest error margins in questions about trauma and surgical approaches and made mistakes in burns and pediatrics. The increase in the error rates in questions containing the root "probability" also showed that the question style affected the answers given.

CONCLUSIONS

Although chatbots show promising success in determining the correct answer, we think that they should not see chatbots as a primary source for the exam, but rather as a good auxiliary tool to support their learning processes.

摘要

目的

近年来,在医学领域将适用于不同软件架构的人工智能工具用于临床和教育目的一直备受关注。在本研究中,我们比较了三种不同的人工智能聊天机器人对从土耳其国家医学专科考试所提问题中获取的急诊医学题库问题给出的答案。我们试图通过根据内容和形式对问题进行分类并检查问题句子来研究其对所给答案的影响。

方法

记录了2015 - 2020年医学专科考试问题中与急诊医学相关的问题。将这些问题提供给包括ChatGPT - 4、Gemini和Copilot在内的人工智能模型。记录问题的长度、问题类型以及错误答案的主题。

结果

就总分而言,最成功的聊天机器人是微软Copilot(错误率为7.8%),而最不成功的是谷歌Gemini(错误率为22.9%)(p<0.001)。所有聊天机器人在关于创伤和手术方法的问题上错误率最高,且在烧伤和儿科问题上出错,这一点很重要。包含“概率”词根的问题中错误率的增加也表明问题风格会影响所给答案。

结论

尽管聊天机器人在确定正确答案方面显示出有前景的成效,但我们认为不应将聊天机器人视为考试的主要信息来源,而应将其视为支持学习过程的良好辅助工具。