Suppr超能文献

使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现:观察性研究。

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

机构信息

School of Medicine, University College Cork, Cork, Ireland.

Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.

出版信息

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Abstract

BACKGROUND

This research explores the capabilities of ChatGPT-4 in passing the American Board of Family Medicine (ABFM) Certification Examination. Addressing a gap in existing literature, where earlier artificial intelligence (AI) models showed limitations in medical board examinations, this study evaluates the enhanced features and potential of ChatGPT-4, especially in document analysis and information synthesis.

OBJECTIVE

The primary goal is to assess whether ChatGPT-4, when provided with extensive preparation resources and when using sophisticated data analysis, can achieve a score equal to or above the passing threshold for the Family Medicine Board Examinations.

METHODS

In this study, ChatGPT-4 was embedded in a specialized subenvironment, "AI Family Medicine Board Exam Taker," designed to closely mimic the conditions of the ABFM Certification Examination. This subenvironment enabled the AI to access and analyze a range of relevant study materials, including a primary medical textbook and supplementary web-based resources. The AI was presented with a series of ABFM-type examination questions, reflecting the breadth and complexity typical of the examination. Emphasis was placed on assessing the AI's ability to interpret and respond to these questions accurately, leveraging its advanced data processing and analysis capabilities within this controlled subenvironment.

RESULTS

In our study, ChatGPT-4's performance was quantitatively assessed on 300 practice ABFM examination questions. The AI achieved a correct response rate of 88.67% (95% CI 85.08%-92.25%) for the Custom Robot version and 87.33% (95% CI 83.57%-91.10%) for the Regular version. Statistical analysis, including the McNemar test (P=.45), indicated no significant difference in accuracy between the 2 versions. In addition, the chi-square test for error-type distribution (P=.32) revealed no significant variation in the pattern of errors across versions. These results highlight ChatGPT-4's capacity for high-level performance and consistency in responding to complex medical examination questions under controlled conditions.

CONCLUSIONS

The study demonstrates that ChatGPT-4, particularly when equipped with specialized preparation and when operating in a tailored subenvironment, shows promising potential in handling the intricacies of medical board examinations. While its performance is comparable with the expected standards for passing the ABFM Certification Examination, further enhancements in AI technology and tailored training methods could push these capabilities to new heights. This exploration opens avenues for integrating AI tools such as ChatGPT-4 in medical education and assessment, emphasizing the importance of continuous advancement and specialized training in medical applications of AI.

摘要

背景

本研究旨在探索 ChatGPT-4 在通过美国家庭医学委员会认证考试(ABFM)方面的能力。针对早期人工智能(AI)模型在医学委员会考试中表现出的局限性这一现有文献空白,本研究评估了 ChatGPT-4 的增强功能和潜力,特别是在文档分析和信息综合方面。

目的

主要目标是评估 ChatGPT-4 在提供广泛的准备资源并使用复杂数据分析的情况下,是否能够达到家庭医学委员会考试的及格分数或更高分数。

方法

在这项研究中,ChatGPT-4 被嵌入到一个名为“AI 家庭医学委员会考试考生”的专门子环境中,该子环境旨在模拟 ABFM 认证考试的条件。这个子环境使 AI 能够访问和分析一系列相关的学习材料,包括一本主要的医学教科书和基于网络的补充资源。AI 被提出了一系列 ABFM 类型的考试问题,反映了考试的广度和复杂性。重点是评估 AI 准确解释和回答这些问题的能力,利用其在这个受控子环境中的先进数据处理和分析能力。

结果

在我们的研究中,ChatGPT-4 在 300 个 ABFM 练习考试问题上进行了定量评估。对于定制机器人版本,AI 的正确响应率为 88.67%(95%置信区间为 85.08%-92.25%),对于常规版本为 87.33%(95%置信区间为 83.57%-91.10%)。包括 McNemar 检验(P=.45)在内的统计分析表明,两个版本的准确性没有显著差异。此外,错误类型分布的卡方检验(P=.32)表明,两个版本的错误模式没有显著变化。这些结果突出了 ChatGPT-4 在受控条件下对复杂医学考试问题进行高级表现和一致性响应的能力。

结论

本研究表明,ChatGPT-4 特别是在配备专门的准备和在定制的子环境中运行时,在处理医学委员会考试的复杂性方面显示出了有希望的潜力。虽然它的表现与通过 ABFM 认证考试的预期标准相当,但人工智能技术的进一步增强和专门的培训方法可以将这些能力提升到新的高度。这种探索为在医学教育和评估中整合像 ChatGPT-4 这样的人工智能工具开辟了道路,强调了在医学应用中不断推进和专门培训人工智能的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c68b/11479358/9213983331b1/mededu-v10-e56128-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验