文献检索，用中文搜 PubMed

应用&插件

Zotero 插件浏览器插件 Mac 客户端 Windows 客户端微信小程序

定价

高级版会员购买积分包购买API积分包

服务

文献检索文档翻译深度研究 API 文档 MCP 服务

关于我们

关于 Suppr 公司介绍联系我们用户协议隐私条款

关注我们

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

粤ICP备2023148730 号-1Suppr @ 2026

Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

机构信息

Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China.

Department of Orthopaedics, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515000, China.

出版信息

BMC Med Educ. 2024 Nov 26;24(1):1372. doi: 10.1186/s12909-024-06309-x.

BACKGROUND

This study aimed to evaluate the performance of GPT-3.5, GPT-4, GPT-4o and Google Bard on the United States Medical Licensing Examination (USMLE), the Professional and Linguistic Assessments Board (PLAB), the Hong Kong Medical Licensing Examination (HKMLE) and the National Medical Licensing Examination (NMLE).

METHODS

This study was conducted in June 2023. Four LLMs (Large Language Models) (GPT-3.5, GPT-4, GPT-4o and Google Bard) were applied to four medical standardized tests (USMLE, PLAB, HKMLE and NMLE). All questions are multiple-choice questions and were sourced from the question banks of these examinations.

RESULTS

In USMLE step 1, step 2CK and Step 3, there are accuracy rates of 91.5%, 94.2% and 92.7% provided from GPT-4o, 93.2%, 95.0% and 92.0% provided from GPT-4, 65.6%, 71.6% and 68.5% provided from GPT-3.5, and 64.3%, 55.6%, 58.1% from Google Bard, respectively. In PLAB, HKMLE and NMLE, GPT-4o scored 93.3%, 91.7% and 84.9%, GPT-4 scored 86.7%, 89.6% and 69.8%, GPT-3.5 scored 80.0%, 68.1% and 60.4%, and Google Bard scored 54.2%, 71.7% and 61.3%. There was significant difference in the accuracy rates of four LLMs in the four medical licensing examinations.

CONCLUSION

GPT-4o performed better in the medical licensing examinations than other three LLMs. The performance of the four models in the NMLE examination needs further improvement.

CLINICAL TRIAL NUMBER

Not applicable.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT 和 Bard 在医学执照考试中的表现因文化而异：一项比较研究。

Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

CLINICAL TRIAL NUMBER

相似文献

引用本文的文献

ChatGPT 和 Bard 在医学执照考试中的表现因文化而异：一项比较研究。

Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

CLINICAL TRIAL NUMBER

相似文献

引用本文的文献