Ting Yu-Ting, Hsieh Te-Chun, Wang Yuh-Feng, Kuo Yu-Chieh, Chen Yi-Jin, Chan Pak-Ki, Kao Chia-Hung
Department of Nuclear Medicine and PET Center, China Medical University Hospital, China Medical University, Taichung.
Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung.
Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.
OBJECTIVE: This research explores the performance of ChatGPT, compared to human doctors, in bilingual, Mandarin Chinese and English, medical specialty exam in Nuclear Medicine in Taiwan. METHODS: The study employed generative pre-trained transformer (GPT-4) and integrated chain-of-thoughts (COT) method to enhance performance by triggering and explaining the thinking process to answer the question in a coherent and logical manner. Questions from the Taiwanese Nuclear Medicine Specialty Exam served as the basis for testing. The research analyzed the correctness of AI responses in different sections of the exam and explored the influence of question length and language proportion on accuracy. RESULTS: AI, especially ChatGPT with COT, exhibited exceptional capabilities in theoretical knowledge, clinical medicine, and handling integrated questions, often surpassing, or matching human doctor performance. However, AI struggled with questions related to medical regulations. The analysis of question length showed that questions within the 109-163 words range yielded the highest accuracy. Moreover, an increase in the proportion of English words in questions improved both AI and human accuracy. CONCLUSIONS: This research highlights the potential and challenges of AI in the medical field. ChatGPT demonstrates significant competence in various aspects of medical knowledge. However, areas like medical regulations require improvement. The study also suggests that AI may help in evaluating exam question difficulty and maintaining fairness in examinations. These findings shed light on AI role in the medical field, with potential applications in healthcare education, exam preparation, and multilingual environments. Ongoing AI advancements are expected to further enhance AI utility in the medical domain.
目的:本研究探讨了ChatGPT与人类医生相比,在台湾核医学双语(中文普通话和英语)专业考试中的表现。 方法:该研究采用生成式预训练变换器(GPT-4)和集成思维链(COT)方法,通过触发和解释思维过程来以连贯和逻辑的方式回答问题,从而提高性能。以台湾核医学专业考试的问题为测试基础。该研究分析了人工智能在考试不同部分回答的正确性,并探讨了问题长度和语言比例对准确性的影响。 结果:人工智能,尤其是具有思维链的ChatGPT,在理论知识、临床医学和处理综合问题方面表现出卓越的能力,常常超过或与人类医生的表现相当。然而,人工智能在与医疗法规相关的问题上存在困难。对问题长度的分析表明,109-163个单词范围内的问题准确率最高。此外,问题中英语单词比例的增加提高了人工智能和人类的准确率。 结论:本研究突出了人工智能在医学领域的潜力和挑战。ChatGPT在医学知识的各个方面都表现出了显著的能力。然而,像医疗法规这样的领域需要改进。该研究还表明,人工智能可能有助于评估考试问题的难度并保持考试的公平性。这些发现揭示了人工智能在医学领域的作用,在医疗保健教育、考试准备和多语言环境中具有潜在应用。预计人工智能的持续进步将进一步提高其在医学领域的效用。
J Chin Med Assoc. 2023-8-1
Digit Health. 2024-2-16
J Med Internet Res. 2025-3-11
J Med Internet Res. 2024-9-10
Ann Biomed Eng. 2023-5
PLOS Digit Health. 2023-2-9
PLOS Digit Health. 2023-2-9
Radiology. 2023-4