ChatGPT-4在台湾中医师执照考试中的表现：横断面研究。

Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.

作者信息

Tseng Liang-Wei, Lu Yi-Chin, Tseng Liang-Chi, Chen Yu-Chun, Chen Hsing-Yu

机构信息

Division of Chinese Acupuncture and Traumatology, Center of Traditional Chinese Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan.

Division of Chinese Internal Medicine, Center for Traditional Chinese Medicine, Chang Gung Memorial Hospital, No. 123, Dinghu Rd, Gueishan Dist, Taoyuan, 33378, Taiwan, 886 3 3196200 ext 2611, 886 3 3298995.

出版信息

JMIR Med Educ. 2025 Mar 19;11:e58897. doi: 10.2196/58897.

DOI:10.2196/58897

PMID:40106227

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11939018/

Abstract

BACKGROUND

The integration of artificial intelligence (AI), notably ChatGPT, into medical education, has shown promising results in various medical fields. Nevertheless, its efficacy in traditional Chinese medicine (TCM) examinations remains understudied.

OBJECTIVE

This study aims to (1) assess the performance of ChatGPT on the TCM licensing examination in Taiwan and (2) evaluate the model's explainability in answering TCM-related questions to determine its suitability as a TCM learning tool.

METHODS

We used the GPT-4 model to respond to 480 questions from the 2022 TCM licensing examination. This study compared the performance of the model against that of licensed TCM doctors using 2 approaches, namely direct answer selection and provision of explanations before answer selection. The accuracy and consistency of AI-generated responses were analyzed. Moreover, a breakdown of question characteristics was performed based on the cognitive level, depth of knowledge, types of questions, vignette style, and polarity of questions.

RESULTS

ChatGPT achieved an overall accuracy of 43.9%, which was lower than that of 2 human participants (70% and 78.4%). The analysis did not reveal a significant correlation between the accuracy of the model and the characteristics of the questions. An in-depth examination indicated that errors predominantly resulted from a misunderstanding of TCM concepts (55.3%), emphasizing the limitations of the model with regard to its TCM knowledge base and reasoning capability.

CONCLUSIONS

Although ChatGPT shows promise as an educational tool, its current performance on TCM licensing examinations is lacking. This highlights the need for enhancing AI models with specialized TCM training and suggests a cautious approach to utilizing AI for TCM education. Future research should focus on model improvement and the development of tailored educational applications to support TCM learning.

摘要

背景

人工智能（AI），尤其是ChatGPT，融入医学教育后，已在多个医学领域显示出有前景的成果。然而，其在中医考试中的功效仍未得到充分研究。

目的

本研究旨在（1）评估ChatGPT在台湾中医执业资格考试中的表现，以及（2）评估该模型在回答中医相关问题时的可解释性，以确定其作为中医学习工具的适用性。

方法

我们使用GPT-4模型回答了2022年中医执业资格考试的480道问题。本研究采用两种方法将模型的表现与持牌中医医生的表现进行比较，即直接选择答案和在选择答案前提供解释。分析了人工智能生成答案的准确性和一致性。此外，还根据认知水平、知识深度、问题类型、病例风格和问题极性对问题特征进行了分类。

结果

ChatGPT的总体准确率为43.9%，低于两名人类参与者的准确率（分别为70%和78.4%）。分析未发现模型准确性与问题特征之间存在显著相关性。深入研究表明，错误主要源于对中医概念的误解（55.3%），这凸显了该模型在中医知识库和推理能力方面的局限性。

结论

尽管ChatGPT有望成为一种教育工具，但其目前在中医执业资格考试中的表现仍有欠缺。这突出了通过专门的中医训练来增强人工智能模型的必要性，并建议在中医教育中谨慎使用人工智能。未来的研究应专注于模型改进和开发量身定制的教育应用程序，以支持中医学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae8/11939018/4f4c6f1a2fe2/mededu-v11-e58897-g001.jpg

相似文献

Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.ChatGPT-4在台湾中医师执照考试中的表现：横断面研究。

JMIR Med Educ. 2025 Mar 19;11:e58897. doi: 10.2196/58897.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响：来自台湾护理执照考试的见解。

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study.ChatGPT-3.5和ChatGPT-4在台湾国家药剂师执照考试中的表现：比较评估研究。

JMIR Med Educ. 2025 Jan 17;11:e56850. doi: 10.2196/56850.

While GPT-3.5 is unable to pass the Physician Licensing Exam in Taiwan, GPT-4 successfully meets the criteria.虽然GPT-3.5无法通过台湾的医师执照考试，但GPT-4成功达到了标准。

J Chin Med Assoc. 2025 May 1;88(5):352-360. doi: 10.1097/JCMA.0000000000001225. Epub 2025 Mar 14.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现：系统评价和荟萃分析。

BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.

Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较：观察性研究。

JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试（USMLE）问题上高精度背后的隐藏挑战：观察性研究。

J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.ChatGPT 在中美护理执照考试中的表现：横断面研究。

JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.

引用本文的文献

Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis.评估大型语言模型对中医临床实践指南的遵循情况：一项内容分析

Front Pharmacol. 2025 Jul 25;16:1649041. doi: 10.3389/fphar.2025.1649041. eCollection 2025.

本文引用的文献

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses.ChatGPT 在国家医师、药师、护士等医学类考试中的表现：一项针对医、药、护人员的五年考试评估研究。

BMC Med Educ. 2024 Feb 14;24(1):143. doi: 10.1186/s12909-024-05125-7.

Opportunities and challenges of traditional Chinese medicine doctors in the era of artificial intelligence.人工智能时代中医医生面临的机遇与挑战

Front Med (Lausanne). 2024 Jan 11;10:1336175. doi: 10.3389/fmed.2023.1336175. eCollection 2023.

Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.结合思维链方法的ChatGPT在双语核医学医师资格考试中的表现

Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.

GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors.GPT-4能够通过韩国韩医医生国家执照考试。

PLOS Digit Health. 2023 Dec 15;2(12):e0000416. doi: 10.1371/journal.pdig.0000416. eCollection 2023 Dec.

Exploring the potential of ChatGPT for clinical reasoning and decision-making: a cross-sectional study on the Italian Medical Residency Exam.探索 ChatGPT 在临床推理和决策中的潜力：一项针对意大利医学住院医师考试的横断面研究。

Ann Ist Super Sanita. 2023 Oct-Dec;59(4):267-270. doi: 10.4415/ANN_23_04_05.

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.ChatGPT-4与日本内科住院医师在普通内科培训考试中的表现比较：比较研究

JMIR Med Educ. 2023 Dec 6;9:e52202. doi: 10.2196/52202.

How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.ChatGPT-4在非英语国家医学执照考试中的表现如何？中文语言环境下的一项评估。

PLOS Digit Health. 2023 Dec 1;2(12):e0000397. doi: 10.1371/journal.pdig.0000397. eCollection 2023 Dec.

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.评估 GPT-3.5 和 GPT-4 在波兰医学期末考试中的表现。

Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.

Performance of ChatGPT on Registered Nurse License Exam in Taiwan: A Descriptive Study.ChatGPT在台湾注册护士执照考试中的表现：一项描述性研究。

Healthcare (Basel). 2023 Oct 30;11(21):2855. doi: 10.3390/healthcare11212855.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT-4在台湾中医师执照考试中的表现：横断面研究。

Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献