Shang Luxiang, Li Rui, Xue Mingyue, Guo Qilong, Hou Yinglong
Department of Cardiology, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, China.
Medical Science and Technology Innovation Center, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China.
Med Teach. 2025 May;47(5):858-864. doi: 10.1080/0142159X.2024.2377808. Epub 2024 Jul 12.
The purpose of this study was to assess the utility of information generated by ChatGPT for residency education in China.
We designed a three-step survey to evaluate the performance of ChatGPT in China's residency training education including residency final examination questions, patient cases, and resident satisfaction scores. First, 204 questions from the residency final exam were input into ChatGPT's interface to obtain the percentage of correct answers. Next, ChatGPT was asked to generate 20 clinical cases, which were subsequently evaluated by three instructors using a pre-designed Likert scale with 5 points. The quality of the cases was assessed based on criteria including clarity, relevance, logicality, credibility, and comprehensiveness. Finally, interaction sessions between 31 third-year residents and ChatGPT were conducted. Residents' perceptions of ChatGPT's feedback were assessed using a Likert scale, focusing on aspects such as ease of use, accuracy and completeness of responses, and its effectiveness in enhancing understanding of medical knowledge.
Our results showed ChatGPT-3.5 correctly answered 45.1% of exam questions. In the virtual patient cases, ChatGPT received mean ratings of 4.57 ± 0.50, 4.68 ± 0.47, 4.77 ± 0.46, 4.60 ± 0.53, and 3.95 ± 0.59 points for clarity, relevance, logicality, credibility, and comprehensiveness from clinical instructors, respectively. Among training residents, ChatGPT scored 4.48 ± 0.70, 4.00 ± 0.82 and 4.61 ± 0.50 points for ease of use, accuracy and completeness, and usefulness, respectively.
Our findings demonstrate ChatGPT's immense potential for personalized Chinese medical education.
本研究旨在评估ChatGPT生成的信息在中国住院医师培训教育中的效用。
我们设计了一项三步调查,以评估ChatGPT在中国住院医师培训教育中的表现,包括住院医师结业考试题目、患者病例和住院医师满意度评分。首先,将204道住院医师结业考试题目输入ChatGPT界面,以获得正确答案的百分比。接下来,要求ChatGPT生成20个临床病例,随后由三名教员使用预先设计的5分李克特量表进行评估。根据清晰度、相关性、逻辑性、可信度和全面性等标准评估病例质量。最后,31名三年级住院医师与ChatGPT进行了互动环节。使用李克特量表评估住院医师对ChatGPT反馈的看法,重点关注易用性、回答的准确性和完整性,以及其在增强医学知识理解方面的有效性。
我们的结果显示,ChatGPT-3.5正确回答了45.1%的考试题目。在虚拟患者病例中,临床教员对ChatGPT在清晰度、相关性、逻辑性、可信度和全面性方面的平均评分分别为4.57±0.50、4.68±0.47、4.77±0.46、4.60±0.53和3.95±0.59分。在培训住院医师中,ChatGPT在易用性、准确性和完整性以及有用性方面的得分分别为4.48±0.70、4.00±0.82和4.61±0.50分。
我们的研究结果表明ChatGPT在中国个性化医学教育中具有巨大潜力。