Keshtkar Alireza, Hayat Ali-Asghar, Atighi Farnaz, Ayare Nazanin, Keshtkar Mohammadreza, Yazdanpanahi Parsa, Sadeghi Erfan, Deilami Noushin, Reihani Hamid, Karimi Alireza, Mokhtari Hamidreza, Hashempur Mohammad Hashem
Research Center of Noncommunicable Diseases, Jahrom University of Medical Sciences, Jahrom, Iran.
Clinical Education Research Center, Department of Medical Education, School of Medicine. Shiraz University of Medical Sciences. Shiraz, Iran.
Med J Islam Repub Iran. 2025 Feb 11;39:24. doi: 10.47176/mjiri.39.24. eCollection 2025.
A 175 billion parameter transformer architecture is used by OpenAI's ChatGPT language model to perform tasks requiring natural language processing. This study aims to evaluate the knowledge and interpretive abilities of ChatGPT on three types of Iranian medical license exams: basic sciences, pre-internship, and pre-residency.
This comparative study involved administering three different levels of Iran's medical license exams, which included basic sciences, pre-internship, and pre-residency, to ChatGPT 3.5. Two versions of each exam were used, corresponding to the ChatGPT 3.5's internet access time: one during the access time and one after. These exams were inputted to ChatGPT in Persian and English. The accuracy and concordance of each question were extracted by two blinded adjudicators.
A total of 2210 questions, including 667 basic sciences, 763 pre-internship, and 780 pre-residency questions, were presented to ChatGPT in both English and Persian languages. Across all tests, the overall accuracy was found to be 48.5%, with an overall concordance of 91%. Notably, English questions exhibited higher accuracy and concordance rates, with 61.4% accuracy and 94.5% concordance, compared to 35.7% accuracy and 88.7% concordance for Persian questions.
Our findings demonstrate that ChatGPT performs above the required passing scores on basic sciences and pre-internship exams. Moreover, ChatGPT could obtain the minimal score needed to apply for residency positions in Iran; however, it was lower than the applicants' mean scores. Significantly, the model showcases its ability to provide reasoning and contextual information in the majority of responses. These results provide compelling evidence for the potential use of ChatGPT in medical education.
OpenAI的ChatGPT语言模型使用了一个拥有1750亿个参数的Transformer架构来执行需要自然语言处理的任务。本研究旨在评估ChatGPT在伊朗医学执照考试的三种类型上的知识和解释能力:基础科学、实习前和住院医师培训前。
这项比较研究涉及向ChatGPT 3.5进行伊朗医学执照考试的三个不同级别,包括基础科学、实习前和住院医师培训前。每个考试使用两个版本,对应ChatGPT 3.5的互联网访问时间:一个在访问期间,一个在访问之后。这些考试以波斯语和英语输入到ChatGPT中。由两名盲法评审员提取每个问题的准确性和一致性。
总共向ChatGPT呈现了2210个问题,包括667个基础科学问题、763个实习前问题和780个住院医师培训前问题,既有英语版本也有波斯语版本。在所有测试中,总体准确率为48.5%,总体一致性为91%。值得注意的是,英语问题的准确率和一致性率更高,准确率为61.4%,一致性为94.5%,而波斯语问题的准确率为35.7%,一致性为88.7%。
我们的研究结果表明,ChatGPT在基础科学和实习前考试中的表现高于所需的及格分数。此外,ChatGPT可以获得申请伊朗住院医师职位所需的最低分数;然而,它低于申请人的平均分数。重要的是,该模型在大多数回答中展示了提供推理和背景信息的能力。这些结果为ChatGPT在医学教育中的潜在应用提供了有力证据。