文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

结合思维链方法的ChatGPT在双语核医学医师资格考试中的表现

Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.

作者信息

Ting Yu-Ting, Hsieh Te-Chun, Wang Yuh-Feng, Kuo Yu-Chieh, Chen Yi-Jin, Chan Pak-Ki, Kao Chia-Hung

机构信息

Department of Nuclear Medicine and PET Center, China Medical University Hospital, China Medical University, Taichung.

Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung.

出版信息

Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.


DOI:10.1177/20552076231224074
PMID:38188855
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10771043/
Abstract

OBJECTIVE: This research explores the performance of ChatGPT, compared to human doctors, in bilingual, Mandarin Chinese and English, medical specialty exam in Nuclear Medicine in Taiwan. METHODS: The study employed generative pre-trained transformer (GPT-4) and integrated chain-of-thoughts (COT) method to enhance performance by triggering and explaining the thinking process to answer the question in a coherent and logical manner. Questions from the Taiwanese Nuclear Medicine Specialty Exam served as the basis for testing. The research analyzed the correctness of AI responses in different sections of the exam and explored the influence of question length and language proportion on accuracy. RESULTS: AI, especially ChatGPT with COT, exhibited exceptional capabilities in theoretical knowledge, clinical medicine, and handling integrated questions, often surpassing, or matching human doctor performance. However, AI struggled with questions related to medical regulations. The analysis of question length showed that questions within the 109-163 words range yielded the highest accuracy. Moreover, an increase in the proportion of English words in questions improved both AI and human accuracy. CONCLUSIONS: This research highlights the potential and challenges of AI in the medical field. ChatGPT demonstrates significant competence in various aspects of medical knowledge. However, areas like medical regulations require improvement. The study also suggests that AI may help in evaluating exam question difficulty and maintaining fairness in examinations. These findings shed light on AI role in the medical field, with potential applications in healthcare education, exam preparation, and multilingual environments. Ongoing AI advancements are expected to further enhance AI utility in the medical domain.

摘要

目的:本研究探讨了ChatGPT与人类医生相比,在台湾核医学双语(中文普通话和英语)专业考试中的表现。 方法:该研究采用生成式预训练变换器(GPT-4)和集成思维链(COT)方法,通过触发和解释思维过程来以连贯和逻辑的方式回答问题,从而提高性能。以台湾核医学专业考试的问题为测试基础。该研究分析了人工智能在考试不同部分回答的正确性,并探讨了问题长度和语言比例对准确性的影响。 结果:人工智能,尤其是具有思维链的ChatGPT,在理论知识、临床医学和处理综合问题方面表现出卓越的能力,常常超过或与人类医生的表现相当。然而,人工智能在与医疗法规相关的问题上存在困难。对问题长度的分析表明,109-163个单词范围内的问题准确率最高。此外,问题中英语单词比例的增加提高了人工智能和人类的准确率。 结论:本研究突出了人工智能在医学领域的潜力和挑战。ChatGPT在医学知识的各个方面都表现出了显著的能力。然而,像医疗法规这样的领域需要改进。该研究还表明,人工智能可能有助于评估考试问题的难度并保持考试的公平性。这些发现揭示了人工智能在医学领域的作用,在医疗保健教育、考试准备和多语言环境中具有潜在应用。预计人工智能的持续进步将进一步提高其在医学领域的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/7ad0232e64ef/10.1177_20552076231224074-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/3693775f86e1/10.1177_20552076231224074-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/e193122ef6d2/10.1177_20552076231224074-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/48eea8cb547f/10.1177_20552076231224074-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/2d8f94c626d9/10.1177_20552076231224074-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/25f2b1840667/10.1177_20552076231224074-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/7ad0232e64ef/10.1177_20552076231224074-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/3693775f86e1/10.1177_20552076231224074-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/e193122ef6d2/10.1177_20552076231224074-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/48eea8cb547f/10.1177_20552076231224074-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/2d8f94c626d9/10.1177_20552076231224074-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/25f2b1840667/10.1177_20552076231224074-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ff/10771043/7ad0232e64ef/10.1177_20552076231224074-fig6.jpg

相似文献

[1]
Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.

Digit Health. 2024-1-5

[2]
Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

JMIR Med Educ. 2024-10-8

[3]
Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.

Cureus. 2024-7-9

[4]
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023-9

[5]
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024-4-29

[6]
Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

Heliyon. 2024-7-18

[7]
Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

Brain Spine. 2023-11-29

[8]
Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.

Cureus. 2024-6-18

[9]
ChatGPT failed Taiwan's Family Medicine Board Exam.

J Chin Med Assoc. 2023-8-1

[10]
Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam.

Digit Health. 2024-2-16

引用本文的文献

[1]
Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study.

Korean J Radiol. 2025-9

[2]
Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.

Sci Rep. 2025-5-30

[3]
Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.

JMIR Med Educ. 2025-3-19

[4]
Accuracy of Large Language Models for Literature Screening in Thoracic Surgery: Diagnostic Study.

J Med Internet Res. 2025-3-11

[5]
Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.

JMIR Med Educ. 2024-11-14

[6]
Exploring the opportunities of large language models for summarizing palliative care consultations: A pilot comparative study.

Digit Health. 2024-11-20

[7]
Prompt Engineering Paradigms for Medical Applications: Scoping Review.

J Med Internet Res. 2024-9-10

[8]
Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language.

Digit Health. 2024-9-2

本文引用的文献

[1]
Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.

Cureus. 2023-3-12

[2]
Role of Chat GPT in Public Health.

Ann Biomed Eng. 2023-5

[3]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[4]
ChatGPT passing USMLE shines a spotlight on the flaws of medical education.

PLOS Digit Health. 2023-2-9

[5]
ChatGPT and the Future of Medical Writing.

Radiology. 2023-4

[6]
Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study.

J Educ Eval Health Prof. 2023

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索