Suppr超能文献

人工智能在一般医学知识方面表现优于医生,但在儿科领域除外:一项横断面研究。

Artificial Intelligence Outperforms Physicians in General Medical Knowledge, Except in the Paediatrics Domain: A Cross-Sectional Study.

作者信息

Miranda Joana, Pereira-Silva Raquel, Guichard João, Meneses Jorge, Carreira Andreia Neves, Seixas Daniela

机构信息

Tonic Easy Medical, S.A., 4300-259 Porto, Portugal.

出版信息

Bioengineering (Basel). 2025 Jun 14;12(6):653. doi: 10.3390/bioengineering12060653.

Abstract

Generative artificial intelligence (genAI) shows promising results in clinical practice. This study compared a GPT-4-turbo virtual assistant with physicians from Italy, France, Spain, and Portugal on medical knowledge derived from national exams while analysing knowledge retention over time and domain-specific performance. Via a digital platform, 17,144 physicians provided 221,574 answers to 600 exam questions between December 2022 and February 2024. Physicians were stratified by years since graduation and specialty, and the assistant answered the same questions in each native language. Differences in proportions of correct answers were tested with binomial logistic regression (odds ratios, 95% CI) or Fisher's exact test (α = 0.05). The assistant outperformed physicians in all countries (72-96% vs. 46-62%; logistic regression, < 0.001). Physicians also trailed the assistant across most knowledge domains ( < 0.001), except paediatrics (45% vs. 52%; Fisher, = 0.60). Accuracy declined with seniority, falling 4-10% between the youngest and oldest cohorts (logistic regression, < 0.001). Overall, genAI exceeds practising doctors on broad medical knowledge and may help counter knowledge attrition, though paediatrics remains a domain requiring targeted refinement.

摘要

生成式人工智能(genAI)在临床实践中显示出了令人鼓舞的成果。本研究将GPT-4-turbo虚拟助手与来自意大利、法国、西班牙和葡萄牙的医生在国家考试中的医学知识方面进行了比较,同时分析了知识随时间的保持情况和特定领域的表现。通过一个数字平台,在2022年12月至2024年2月期间,17144名医生对600道考试题目提供了221574个答案。医生按毕业年限和专业进行分层,该助手用每种母语回答相同的问题。用二项逻辑回归(优势比,95%置信区间)或费舍尔精确检验(α = 0.05)来检验正确答案比例的差异。在所有国家,该助手的表现都优于医生(72%-96%对46%-62%;逻辑回归,P < 0.001)。在大多数知识领域,医生也落后于该助手(P < 0.001),儿科领域除外(45%对52%;费舍尔检验,P = 0.60)。准确性随着资历的增加而下降,在最年轻和最年长的队列之间下降了4%-10%(逻辑回归,P < 0.001)。总体而言,在广泛的医学知识方面,生成式人工智能超过了执业医生,可能有助于应对知识流失,不过儿科仍然是一个需要针对性改进的领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82c7/12190018/8d8f4a7a16aa/bioengineering-12-00653-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验