Costa Lucas Plens DE Britto, Castro Danilo Henrique Pizzo DE, Cordeiro Renato Pinheiro, Albino Rômulo Ballarin
Universidade Estadual Paulista, Grupo de Medicina e Cirurgia do Pe e Tornozelo, Department of Surgery and Orthopedics, Botucatu, SP, Brazil.
Universidade Federal de São Paulo, Escola Paulista de Medicina, Departamento de Ortopedia e Traumatologia, São Paulo,SP, Brazil.
Acta Ortop Bras. 2025 Apr 7;33(spe1):e280947. doi: 10.1590/1413-785220243201e280947. eCollection 2025.
ChatGPT, an advanced Artificial Intelligence model specialized in natural language processing, shows remarkable abilities, achieving high scores in certification exams in various specialties. This study aims to evaluate ChatGPT's performance in multiple-choice tests applied to obtain specialist certification in Orthopedics and Traumatology.
We used ChatGPT 4.0 to answer 100 questions from the first phase of the 2022 (TEOT) (Specialist in Orthopedics and Traumatology Test). We excluded non-text-based questions. Each question was entered individually into ChatGPT, with a new session initiated for each question. Performance was evaluated regarding number of words and questions' taxonomic classification.
Of the 95 questions analyzed, ChatGPT answered 61.05% correctly and 38.95% incorrectly. There was no statistically significant difference regarding number of words, and ChatGPT's performance did not vary according to taxonomic level.
ChatGPT demonstrated vast knowledge in Orthopedics, with acceptable performance in the TEOT exam. Results suggest ChatGPT's an educational and clinical resource in Orthopedics, but needs future progress and human supervision for its effective application.
ChatGPT是一种专门用于自然语言处理的先进人工智能模型,展现出卓越的能力,在各个专业的认证考试中都取得了高分。本研究旨在评估ChatGPT在用于获得骨科与创伤学专业认证的多项选择题测试中的表现。
我们使用ChatGPT 4.0回答了2022年骨科与创伤学专业考试(TEOT)第一阶段的100道问题。我们排除了非基于文本的问题。每个问题单独输入ChatGPT,每个问题开启一个新会话。从单词数量和问题的分类学分类方面评估其表现。
在分析的95道问题中,ChatGPT回答正确的比例为61.05%,回答错误的比例为38.95%。在单词数量方面没有统计学上的显著差异,并且ChatGPT的表现不会因分类学级别而有所不同。
ChatGPT在骨科领域展现出了丰富的知识,在TEOT考试中的表现尚可。结果表明ChatGPT是骨科领域的一种教育和临床资源,但要有效应用还需要未来的改进和人工监督。