Breeding Tessa, Martinez Brian, Patel Heli, Nasef Hazem, Arif Hasan, Nakayama Don, Elkbuli Adel
Kiran Patel College of Allopathic Medicine, NOVA Southeastern University, Fort Lauderdale, FL, USA.
Mercer University School of Medicine, Columbus, GA, USA.
Am Surg. 2024 Apr;90(4):560-566. doi: 10.1177/00031348231180950. Epub 2023 Jun 13.
ChatGPT has substantial potential to revolutionize medical education. We aim to assess how medical students and laypeople evaluate information produced by ChatGPT compared to an evidence-based resource on the diagnosis and management of 5 common surgical conditions.
A 60-question anonymous online survey was distributed to third- and fourth-year U.S. medical students and laypeople to evaluate articles produced by ChatGPT and an evidence-based source on clarity, relevance, reliability, validity, organization, and comprehensiveness. Participants received 2 blinded articles, 1 from each source, for each surgical condition. Paired-sample t-tests were used to compare ratings between the 2 sources.
Of 56 survey participants, 50.9% (n = 28) were U.S. medical students and 49.1% (n = 27) were from the general population. Medical students reported that ChatGPT articles displayed significantly more clarity (appendicitis: 4.39 vs 3.89, = .020; diverticulitis: 4.54 vs 3.68, < .001; SBO 4.43 vs 3.79, = .003; GI bleed: 4.36 vs 3.93, = .020) and better organization (diverticulitis: 4.36 vs 3.68, = .021; SBO: 4.39 vs 3.82, = .033) than the evidence-based source. However, for all 5 conditions, medical students found evidence-based passages to be more comprehensive than ChatGPT articles (cholecystitis: 4.04 vs 3.36, = .009; appendicitis: 4.07 vs 3.36, = .015; diverticulitis: 4.07 vs 3.36, = .015; small bowel obstruction: 4.11 vs 3.54, = .030; upper GI bleed: 4.11 vs 3.29, = .003).
Medical students perceived ChatGPT articles to be clearer and better organized than evidence-based sources on the pathogenesis, diagnosis, and management of 5 common surgical pathologies. However, evidence-based articles were rated as significantly more comprehensive.
ChatGPT具有变革医学教育的巨大潜力。我们旨在评估医学生和普通民众如何将ChatGPT生成的信息与关于5种常见外科疾病诊断和管理的循证资源进行比较。
向美国三、四年级医学生和普通民众发放了一份包含60个问题的匿名在线调查问卷,以评估ChatGPT生成的文章和循证来源在清晰度、相关性、可靠性、有效性、组织性和全面性方面的表现。对于每种外科疾病,参与者收到两篇盲法文章,分别来自两个来源。采用配对样本t检验比较两个来源的评分。
在56名调查参与者中,50.9%(n = 28)是美国医学生,49.1%(n = 27)来自普通人群。医学生报告称,ChatGPT生成的文章在清晰度方面显著更高(阑尾炎:4.39对3.89,P = 0.020;憩室炎:4.54对3.68,P < 0.001;小肠梗阻:4.43对3.79,P = 0.003;消化道出血:4.36对3.93,P = 0.020),在组织性方面也更好(憩室炎:4.36对3.68,P = 0.021;小肠梗阻:4.39对3.82,P = 0.033)。然而,对于所有5种疾病,医学生发现循证文章比ChatGPT生成的文章更全面(胆囊炎:4.04对3.36,P = 0.009;阑尾炎:4.07对3.36,P = 0.015;憩室炎:4.07对3.36,P =