Rodrigues Alessi Mateus, Gomes Heitor A, Lopes de Castro Matheus, Terumy Okamoto Cristina
School of Medicine, Universidade Positivo, Curitiba, BRA.
Neonatology, Universidade Positivo, Curitiba, BRA.
Cureus. 2024 Jul 19;16(7):e64924. doi: 10.7759/cureus.64924. eCollection 2024 Jul.
Background The use of artificial intelligence (AI) is not a recent phenomenon, but the latest advancements in this technology are making a significant impact across various fields of human knowledge. In medicine, this trend is no different, although it has developed at a slower pace. ChatGPT is an example of an AI-based algorithm capable of answering questions, interpreting phrases, and synthesizing complex information, potentially aiding and even replacing humans in various areas of social interest. Some studies have compared its performance in solving medical knowledge exams with medical students and professionals to verify AI accuracy. This study aimed to measure the performance of ChatGPT in answering questions from the Progress Test from 2021 to 2023. Methodology An observational study was conducted in which questions from the 2021 Progress Test and the regional tests (Southern Institutional Pedagogical Support Center II) of 2022 and 2023 were presented to ChatGPT 3.5. The results obtained were compared with the scores of first- to sixth-year medical students from over 120 Brazilian universities. All questions were presented sequentially, without any modification to their structure. After each question was presented, the platform's history was cleared, and the site was restarted. Results The platform achieved an average accuracy rate in 2021, 2022, and 2023 of 69.7%, 68.3%, and 67.2%, respectively, surpassing students from all medical years in the three tests evaluated, reinforcing findings in the current literature. The subject with the best score for the AI was Public Health, with a mean grade of 77.8%. Conclusions ChatGPT demonstrated the ability to answer medical questions with higher accuracy than humans, including students from the last year of medical school.
背景 人工智能(AI)的使用并非近期才出现的现象,但其技术的最新进展正在对人类知识的各个领域产生重大影响。在医学领域,这一趋势也不例外,尽管其发展速度较慢。ChatGPT是一种基于人工智能的算法,能够回答问题、解释短语并合成复杂信息,有可能在社会关注的各个领域帮助甚至取代人类。一些研究将其在解决医学知识考试中的表现与医学生和专业人员进行了比较,以验证人工智能的准确性。本研究旨在衡量ChatGPT在回答2021年至2023年进阶测试问题方面的表现。
方法 进行了一项观察性研究,将2021年进阶测试以及2022年和2023年的区域测试(南部机构教学支持中心II)中的问题呈现给ChatGPT 3.5。将获得的结果与来自120多所巴西大学的一年级至六年级医学生的成绩进行比较。所有问题均按顺序呈现,其结构未作任何修改。每个问题呈现后,清除平台历史记录并重新启动该网站。
结果 该平台在2021年、2022年和2023年的平均准确率分别为69.7%、68.3%和67.2%,在评估的三项测试中均超过了所有医学年级的学生,这强化了当前文献中的研究结果。人工智能得分最高的科目是公共卫生,平均成绩为77.8%。
结论 ChatGPT证明了其回答医学问题的能力比人类更高,包括医学院最后一年的学生。