• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在基于病例的盲法评估中,专业大语言模型在复杂诊断方面的表现优于神经科医生。

Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation.

作者信息

Barrit Sami, Torcida Nathan, Mazeraud Aurelien, Boulogne Sebastien, Benoit Jeanne, Carette Timothée, Carron Thibault, Delsaut Bertil, Diab Eva, Kermorvant Hugo, Maarouf Adil, Maldonado Slootjes Sofia, Redon Sylvain, Robin Alexis, Hadidane Sofiene, Harlay Vincent, Tota Vito, Madec Tanguy, Niset Alexandre, Al Barajraji Mejdeddine, Madsen Joseph R, El Hadwe Salim, Massager Nicolas, Lagarde Stanislas, Carron Romain

机构信息

Neurosurgery, Université Libre de Bruxelles, 1070 Brussels, Belgium.

Neurosurgery, CHU Tivoli, 7110 La Louvière, Belgium.

出版信息

Brain Sci. 2025 Mar 27;15(4):347. doi: 10.3390/brainsci15040347.

DOI:10.3390/brainsci15040347
PMID:40309809
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12025783/
Abstract

: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM's capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. : We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating "long-term memory" and "short-term memory" components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. : AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists' average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. : A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.

摘要

人工智能(AI),尤其是大语言模型(LLMs),已在各种应用中展现出通用性,但在神经学等专业领域面临挑战。本研究评估了一种专门的大语言模型在复杂神经诊断中的能力和可信度,并在模拟临床环境中将其表现与神经科医生进行比较。

我们通过Neura(美国纽约州纽约市Sciense公司)部署了GPT-4 Turbo(美国加利福尼亚州旧金山OpenAI公司),Neura是一种人工智能基础设施,具有双数据库架构,在经过整理的神经学语料库上集成了“长期记忆”和“短期记忆”组件。向13名神经科医生和人工智能系统呈现了五个具有代表性的临床场景。参与者根据初始表现制定鉴别诊断,在收到确凿的临床信息后得出最终诊断。两名资深学术神经科医生对所有回答进行盲评,同时一名独立调查员评估人工智能生成信息的可验证性。

与神经科医生(55.11%,<0.001)相比,人工智能获得了显著更高的标准化分数(86.17%)。对于鉴别诊断问题,人工智能的得分是85%,而神经科医生为46.15%;对于最终诊断,人工智能为88.24%,神经科医生为70.93%。人工智能在其20次评估中获得了15个最高分,且回答时间不到30秒,而神经科医生的平均回答时间为9分钟。所有人工智能提供的参考文献均被归类为相关,未检测到幻觉内容。

与执业神经科医生相比,一种专门的大语言模型在复杂的临床挑战中表现出卓越的诊断性能。这表明,利用经过整理的知识库适当使用大语言模型可以在复杂的临床学科中实现特定领域的相关性,这表明人工智能在临床实践中作为一种节省时间的资产具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/18181dffb55c/brainsci-15-00347-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/f54c3c672ce9/brainsci-15-00347-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/aa707d3de21e/brainsci-15-00347-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/adea19d917c8/brainsci-15-00347-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/9f9dcc149200/brainsci-15-00347-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/e551032439f0/brainsci-15-00347-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/18181dffb55c/brainsci-15-00347-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/f54c3c672ce9/brainsci-15-00347-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/aa707d3de21e/brainsci-15-00347-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/adea19d917c8/brainsci-15-00347-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/9f9dcc149200/brainsci-15-00347-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/e551032439f0/brainsci-15-00347-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22ed/12025783/18181dffb55c/brainsci-15-00347-g006.jpg

相似文献

1
Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation.在基于病例的盲法评估中,专业大语言模型在复杂诊断方面的表现优于神经科医生。
Brain Sci. 2025 Mar 27;15(4):347. doi: 10.3390/brainsci15040347.
2
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
3
Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.评估人工智能在专科医学领域的能力:西班牙神经病学专科考试中ChatGPT与神经科医生的比较分析
JMIR Med Educ. 2024 Nov 14;10:e56762. doi: 10.2196/56762.
4
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.
5
ChatGPT4's diagnostic accuracy in inpatient neurology: A retrospective cohort study.ChatGPT4在住院神经内科的诊断准确性:一项回顾性队列研究。
Heliyon. 2024 Dec 9;10(24):e40964. doi: 10.1016/j.heliyon.2024.e40964. eCollection 2024 Dec 30.
6
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
7
Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.评估ChatGPT-4在鉴别诊断中识别最终诊断的准确性与医生的准确性比较:诊断病例的实验研究
JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.
8
Artificial intelligence versus neurologists: A comparative study on multiple sclerosis expertise.人工智能与神经科医生:关于多发性硬化症专业知识的比较研究。
Clin Neurol Neurosurg. 2025 Mar;250:108785. doi: 10.1016/j.clineuro.2025.108785. Epub 2025 Feb 20.
9
Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer's Disease Management: Comparative Study of Neurologist- and Artificial Intelligence-Generated Responses.评估生成式预训练转换器(GPT)在阿尔茨海默病管理中的作用:神经科医生和人工智能生成的回复的对比研究。
J Med Internet Res. 2024 Oct 31;26:e51095. doi: 10.2196/51095.
10
Diagnostic accuracy of large language models in psychiatry.精神科大语言模型的诊断准确性。
Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
Embers of autoregression show how large language models are shaped by the problem they are trained to solve.自回归的余烬表明,大型语言模型是如何被它们被训练来解决的问题所塑造的。
Proc Natl Acad Sci U S A. 2024 Oct 8;121(41):e2322420121. doi: 10.1073/pnas.2322420121. Epub 2024 Oct 4.
3
Performance of Large Language Models on a Neurology Board-Style Examination.
大语言模型在神经科 board-style 考试中的表现。
JAMA Netw Open. 2023 Dec 1;6(12):e2346721. doi: 10.1001/jamanetworkopen.2023.46721.
4
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
5
Artificial Intelligence in Medicine.医学中的人工智能
N Engl J Med. 2023 Mar 30;388(13):1220-1221. doi: 10.1056/NEJMe2206291.
6
GPT-4 is here: what scientists think.GPT-4来了:科学家们的看法。
Nature. 2023 Mar;615(7954):773. doi: 10.1038/d41586-023-00816-5.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Artificial intelligence: A powerful paradigm for scientific research.人工智能:科学研究的强大范式。
Innovation (Camb). 2021 Oct 28;2(4):100179. doi: 10.1016/j.xinn.2021.100179. eCollection 2021 Nov 28.
9
Clinical Reasoning: A 55-Year-Old Man With Odd Behavior and Abnormal Movements.临床推理:一名55岁有怪异行为和异常动作的男性。
Neurology. 2021 Dec 7;97(23):1090-1093. doi: 10.1212/WNL.0000000000012663. Epub 2021 Aug 16.
10
Clinical Reasoning: Recurrent strokes secondary to unknown vasculopathy.临床推理:继发于不明血管病变的复发性中风
Neurology. 2020 Jun 2;94(22):e2396-e2401. doi: 10.1212/WNL.0000000000009534. Epub 2020 May 15.