Takagi Soshi, Watari Takashi, Erabi Ayano, Sakaguchi Kota
Faculty of Medicine, Shimane University, Izumo, Japan.
General Medicine Center, Shimane University Hospital, Izumo, Japan.
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied.
This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages.
This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions.
The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.
GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.
ChatGPT(聊天生成预训练变换器)在非英语语言方面的能力尚未得到充分研究。
本研究比较了GPT-3.5(生成式预训练变换器)和GPT-4在日本医师执照考试(JMLE)中的表现,以评估这些模型在非英语语言临床推理和医学知识方面的可靠性。
本研究使用了基于GPT-3.5的ChatGPT默认模式、ChatGPT Plus的GPT-4模型以及2023年第117次JMLE。最终分析共纳入254道题,分为一般、临床和临床句子题3种类型。
结果表明,GPT-4在准确性方面优于GPT-3.5,尤其是在一般、临床和临床句子题上。GPT-4在难题和特定疾病问题上也表现更好。此外,GPT-4达到了JMLE的及格标准,表明其在非英语语言临床推理和医学知识方面的可靠性。
GPT-4可能成为日本等非英语地区医学教育和临床支持的宝贵工具。