• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-4 在回答巴西医学学位再认证国家考试问题方面的表现。

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.

机构信息

Instituto Paulista de Estudos e Pesquisas em Oftalmologia, Vision Institute - São Paulo (SP), Brazil.

Massachusetts Institute of Technology, Institute for Medical Engineering and Science - Cambridge (MA), USA.

出版信息

Rev Assoc Med Bras (1992). 2023 Sep 25;69(10):e20230848. doi: 10.1590/1806-9282.20230848. eCollection 2023.

DOI:10.1590/1806-9282.20230848
PMID:37792871
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10547492/
Abstract

OBJECTIVE

The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination.

METHODS

A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis.

RESULTS

In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups.

CONCLUSION

ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.

摘要

目的

本研究旨在评估 ChatGPT-4.0 在回答 2022 年巴西医学学位再认证考试(Revalida)中的表现,并作为评估考试质量的工具。

方法

两位独立医生将所有考试问题输入到 ChatGPT-4.0 中。在将输出结果与测试解决方案进行比较后,他们将大语言模型的答案分为充分、不充分或不确定。在存在分歧的情况下,他们进行裁决并就 ChatGPT 的准确性达成共识。使用卡方统计分析比较了不同医学主题和无效问题的性能。

结果

在 Revalida 考试中,ChatGPT-4.0 正确回答了 71 个(87.7%)问题,错误回答了 10 个(12.3%)。不同医学主题的正确答案比例没有统计学差异(p=0.4886)。在无效问题中,人工智能模型的准确率较低,为 71.4%,但在非无效和无效组之间没有统计学差异(p=0.241)。

结论

ChatGPT-4.0 在 2022 年巴西医学学位再认证考试中表现出令人满意的性能。大语言模型在主观问题和公共卫生保健主题上的表现较差。本研究结果表明,Revalida 考试问题的整体质量令人满意,并证实了无效问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/d8a3a41ffa24/1806-9282-ramb-69-10-e20230848-gf02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/2126ccc5d9f5/1806-9282-ramb-69-10-e20230848-gf01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/d8a3a41ffa24/1806-9282-ramb-69-10-e20230848-gf02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/2126ccc5d9f5/1806-9282-ramb-69-10-e20230848-gf01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/d8a3a41ffa24/1806-9282-ramb-69-10-e20230848-gf02.jpg

相似文献

1
Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.ChatGPT-4 在回答巴西医学学位再认证国家考试问题方面的表现。
Rev Assoc Med Bras (1992). 2023 Sep 25;69(10):e20230848. doi: 10.1590/1806-9282.20230848. eCollection 2023.
2
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
3
Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.根据基于能力的医学教育课程评估ChatGPT回答微生物学一阶和二阶知识问题的能力。
Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.
4
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
5
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.
6
Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).ChatGPT 在医学中作为 AI 辅助决策支持工具的性能:解释常见心脏疾病症状和管理的概念验证研究 (AMSTELHEART-2)。
Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.
7
Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study.大型语言模型ChatGPT在日本国家护士考试中的表现:评估研究
JMIR Nurs. 2023 Jun 27;6:e47305. doi: 10.2196/47305.
8
Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.评估大语言模型的能力:GPT-4在外科知识评估中的表现。
Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.
9
ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。
Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.
10
Artificial intelligence performance in clinical neurology queries: the ChatGPT model.人工智能在临床神经学查询中的表现:ChatGPT 模型。
Neurol Res. 2024 May;46(5):437-443. doi: 10.1080/01616412.2024.2334118. Epub 2024 Mar 24.

引用本文的文献

1
Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较:横断面问卷调查研究
JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.
2
Assessing Large Language Models for Medical Question Answering in Portuguese: Open-Source Versus Closed-Source Approaches.评估用于葡萄牙语医学问答的大语言模型:开源与闭源方法
Cureus. 2025 May 15;17(5):e84165. doi: 10.7759/cureus.84165. eCollection 2025 May.
3
Identification of Online Health Information Using Large Pretrained Language Models: Mixed Methods Study.

本文引用的文献

1
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
2
Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。
JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.
3
ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology.
使用大型预训练语言模型识别在线健康信息:混合方法研究。
J Med Internet Res. 2025 May 14;27:e70733. doi: 10.2196/70733.
4
Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性:使用巴西眼科理事会委员会考试问题进行的性能分析
Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.
5
Performance of chatbots in queries concerning fundamental concepts in photochemistry.聊天机器人在光化学基本概念相关查询中的表现。
Photochem Photobiol. 2024 Nov 4. doi: 10.1111/php.14037.
6
A framework for human evaluation of large language models in healthcare derived from literature review.一个源自文献综述的用于医疗保健领域大语言模型人工评估的框架。
NPJ Digit Med. 2024 Sep 28;7(1):258. doi: 10.1038/s41746-024-01258-7.
7
Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.医疗保健中的多模态大型语言模型:应用、挑战和未来展望。
J Med Internet Res. 2024 Sep 25;26:e59505. doi: 10.2196/59505.
8
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.人类与人工智能:ChatGPT-4在临床化学选择题方面表现优于必应、巴德、ChatGPT-3.5和人类。
Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.
9
Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice.ChatGPT在解答进度测试(巴西国家医学考试)问题中的表现:医学实践中的一种潜在人工智能工具。
Cureus. 2024 Jul 19;16(7):e64924. doi: 10.7759/cureus.64924. eCollection 2024 Jul.
10
Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.评估ChatGPT-3.5和ChatGPT-4在台湾整形外科医师资格考试中的表现。
Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.
ChatGPT 在妇产科虚拟客观结构化临床考试中优于人类考生。
Am J Obstet Gynecol. 2023 Aug;229(2):172.e1-172.e12. doi: 10.1016/j.ajog.2023.04.020. Epub 2023 Apr 22.
4
Early applications of ChatGPT in medical practice, education and research.ChatGPT 在医疗实践、教育和研究中的早期应用。
Clin Med (Lond). 2023 May;23(3):278-279. doi: 10.7861/clinmed.2023-0078. Epub 2023 Apr 21.
5
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
6
Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma.ChatGPT在医学、科学和学术出版领域的机遇与风险:现代普罗米修斯式困境。
Croat Med J. 2023 Feb 28;64(1):1-3. doi: 10.3325/cmj.2023.64.1.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Natural language processing: state of the art, current trends and challenges.自然语言处理:技术现状、当前趋势与挑战。
Multimed Tools Appl. 2023;82(3):3713-3744. doi: 10.1007/s11042-022-13428-4. Epub 2022 Jul 14.
9
Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare.2014年的临床自然语言处理:支持高效医疗保健的基础方法。
Yearb Med Inform. 2015 Aug 13;10(1):194-8. doi: 10.15265/IY-2015-035.