Surapaneni Krishna Mohan
Panimalar Medical College Hospital & Research Institute, Chennai, India.
JMIR Med Educ. 2023 Nov 7;9:e47191. doi: 10.2196/47191.
ChatGPT has gained global attention recently owing to its high performance in generating a wide range of information and retrieving any kind of data instantaneously. ChatGPT has also been tested for the United States Medical Licensing Examination (USMLE) and has successfully cleared it. Thus, its usability in medical education is now one of the key discussions worldwide.
The objective of this study is to evaluate the performance of ChatGPT in medical biochemistry using clinical case vignettes.
The performance of ChatGPT was evaluated in medical biochemistry using 10 clinical case vignettes. Clinical case vignettes were randomly selected and inputted in ChatGPT along with the response options. We tested the responses for each clinical case twice. The answers generated by ChatGPT were saved and checked using our reference material.
ChatGPT generated correct answers for 4 questions on the first attempt. For the other cases, there were differences in responses generated by ChatGPT in the first and second attempts. In the second attempt, ChatGPT provided correct answers for 6 questions and incorrect answers for 4 questions out of the 10 cases that were used. But, to our surprise, for case 3, different answers were obtained with multiple attempts. We believe this to have happened owing to the complexity of the case, which involved addressing various critical medical aspects related to amino acid metabolism in a balanced approach.
According to the findings of our study, ChatGPT may not be considered an accurate information provider for application in medical education to improve learning and assessment. However, our study was limited by a small sample size (10 clinical case vignettes) and the use of the publicly available version of ChatGPT (version 3.5). Although artificial intelligence (AI) has the capability to transform medical education, we emphasize the validation of such data produced by such AI systems for correctness and dependability before it could be implemented in practice.
ChatGPT最近在生成广泛信息和即时检索任何类型数据方面表现出色,从而获得了全球关注。ChatGPT还接受了美国医师执照考试(USMLE)的测试,并成功通过。因此,其在医学教育中的可用性成为目前全球的关键讨论话题之一。
本研究旨在使用临床病例 vignettes 评估ChatGPT在医学生物化学方面的表现。
使用10个临床病例 vignettes 评估ChatGPT在医学生物化学方面的表现。随机选择临床病例 vignettes 并与回答选项一起输入ChatGPT。我们对每个临床病例的回答进行了两次测试。将ChatGPT生成的答案保存下来,并使用我们的参考资料进行核对。
ChatGPT首次尝试就为4个问题给出了正确答案。对于其他病例,ChatGPT在第一次和第二次尝试中生成的回答存在差异。在第二次尝试中,在使用的10个病例中,ChatGPT为6个问题提供了正确答案,为4个问题提供了错误答案。但是,令我们惊讶的是,对于病例3,多次尝试得到了不同的答案。我们认为这是由于该病例的复杂性,它涉及以平衡的方式处理与氨基酸代谢相关的各种关键医学方面。
根据我们的研究结果,ChatGPT可能不能被视为用于医学教育以改善学习和评估的准确信息提供者。然而,我们的研究受到样本量小(10个临床病例 vignettes)和使用ChatGPT公开可用版本(3.5版)的限制。尽管人工智能(AI)有能力改变医学教育,但我们强调在此类AI系统产生的数据能够在实践中实施之前,要对其正确性和可靠性进行验证。