Mayo-Yáñez Miguel, Lechien Jerome R, Maria-Saibene Alberto, Vaira Luigi A, Maniaci Antonino, Chiesa-Estomba Carlos M
Young-Otolaryngologists of the International Federation of Oto-Rhino-Laryngological Societies (YO-IFOS) Study Group, 75000 Paris, France.
Otorhinolaryngology - Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), 15006 A Coruña, Galicia Spain.
Indian J Otolaryngol Head Neck Surg. 2024 Aug;76(4):3465-3469. doi: 10.1007/s12070-024-04729-1. Epub 2024 May 1.
To evaluate the response capabilities, in a public healthcare system otolaryngology job competition examination, of ChatGPT 3.5 and an internet-connected GPT-4 engine (Microsoft Copilot) with the real scores of otolaryngology specialists as the control group. In September 2023, 135 questions divided into theoretical and practical parts were input into ChatGPT 3.5 and an internet-connected GPT-4. The accuracy of AI responses was compared with the official results from otolaryngologists who took the exam, and statistical analysis was conducted using Stata 14.2. Copilot (GPT-4) outperformed ChatGPT 3.5. Copilot achieved a score of 88.5 points, while ChatGPT scored 60 points. Both AIs had discrepancies in their incorrect answers. Despite ChatGPT's proficiency, Copilot displayed superior performance, ranking as the second-best score among the 108 otolaryngologists who took the exam, while ChatGPT was placed 83rd. A chat powered by GPT-4 with internet access (Copilot) demonstrates superior performance in responding to multiple-choice medical questions compared to ChatGPT 3.5.
以耳鼻喉科专家的实际分数作为对照组,在一次公共医疗系统耳鼻喉科职位竞争考试中评估ChatGPT 3.5和联网的GPT-4引擎(Microsoft Copilot)的应答能力。2023年9月,将135道分为理论和实践部分的问题输入ChatGPT 3.5和联网的GPT-4。将人工智能的回答准确性与参加考试的耳鼻喉科医生的官方结果进行比较,并使用Stata 14.2进行统计分析。Copilot(GPT-4)的表现优于ChatGPT 3.5。Copilot获得了88.5分,而ChatGPT得分为60分。两个人工智能在答错的答案上都存在差异。尽管ChatGPT表现出色,但Copilot表现更优,在参加考试的108名耳鼻喉科医生中排名第二,而ChatGPT则排在第83位。与ChatGPT 3.5相比,由联网的GPT-4驱动的聊天机器人(Copilot)在回答多项选择题方面表现更优。