Tseng Liang-Wei, Lu Yi-Chin, Tseng Liang-Chi, Chen Yu-Chun, Chen Hsing-Yu
Division of Chinese Acupuncture and Traumatology, Center of Traditional Chinese Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
Division of Chinese Internal Medicine, Center for Traditional Chinese Medicine, Chang Gung Memorial Hospital, No. 123, Dinghu Rd, Gueishan Dist, Taoyuan, 33378, Taiwan, 886 3 3196200 ext 2611, 886 3 3298995.
JMIR Med Educ. 2025 Mar 19;11:e58897. doi: 10.2196/58897.
The integration of artificial intelligence (AI), notably ChatGPT, into medical education, has shown promising results in various medical fields. Nevertheless, its efficacy in traditional Chinese medicine (TCM) examinations remains understudied.
This study aims to (1) assess the performance of ChatGPT on the TCM licensing examination in Taiwan and (2) evaluate the model's explainability in answering TCM-related questions to determine its suitability as a TCM learning tool.
We used the GPT-4 model to respond to 480 questions from the 2022 TCM licensing examination. This study compared the performance of the model against that of licensed TCM doctors using 2 approaches, namely direct answer selection and provision of explanations before answer selection. The accuracy and consistency of AI-generated responses were analyzed. Moreover, a breakdown of question characteristics was performed based on the cognitive level, depth of knowledge, types of questions, vignette style, and polarity of questions.
ChatGPT achieved an overall accuracy of 43.9%, which was lower than that of 2 human participants (70% and 78.4%). The analysis did not reveal a significant correlation between the accuracy of the model and the characteristics of the questions. An in-depth examination indicated that errors predominantly resulted from a misunderstanding of TCM concepts (55.3%), emphasizing the limitations of the model with regard to its TCM knowledge base and reasoning capability.
Although ChatGPT shows promise as an educational tool, its current performance on TCM licensing examinations is lacking. This highlights the need for enhancing AI models with specialized TCM training and suggests a cautious approach to utilizing AI for TCM education. Future research should focus on model improvement and the development of tailored educational applications to support TCM learning.
人工智能(AI),尤其是ChatGPT,融入医学教育后,已在多个医学领域显示出有前景的成果。然而,其在中医考试中的功效仍未得到充分研究。
本研究旨在(1)评估ChatGPT在台湾中医执业资格考试中的表现,以及(2)评估该模型在回答中医相关问题时的可解释性,以确定其作为中医学习工具的适用性。
我们使用GPT-4模型回答了2022年中医执业资格考试的480道问题。本研究采用两种方法将模型的表现与持牌中医医生的表现进行比较,即直接选择答案和在选择答案前提供解释。分析了人工智能生成答案的准确性和一致性。此外,还根据认知水平、知识深度、问题类型、病例风格和问题极性对问题特征进行了分类。
ChatGPT的总体准确率为43.9%,低于两名人类参与者的准确率(分别为70%和78.4%)。分析未发现模型准确性与问题特征之间存在显著相关性。深入研究表明,错误主要源于对中医概念的误解(55.3%),这凸显了该模型在中医知识库和推理能力方面的局限性。
尽管ChatGPT有望成为一种教育工具,但其目前在中医执业资格考试中的表现仍有欠缺。这突出了通过专门的中医训练来增强人工智能模型的必要性,并建议在中医教育中谨慎使用人工智能。未来的研究应专注于模型改进和开发量身定制的教育应用程序,以支持中医学习。