• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

神经外科中的大语言模型:系统评价和荟萃分析。

Large language models in neurosurgery: a systematic review and meta-analysis.

机构信息

Harvard Medical School, Harvard University, Boston, MA, 02115, USA.

Computational Neuroscience Outcomes Center, Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

Acta Neurochir (Wien). 2024 Nov 23;166(1):475. doi: 10.1007/s00701-024-06372-9.

DOI:10.1007/s00701-024-06372-9
PMID:39579215
Abstract

BACKGROUND

Large Language Models (LLMs) have garnered increasing attention in neurosurgery and possess significant potential to improve the field. However, the breadth and performance of LLMs across diverse neurosurgical tasks have not been systematically examined, and LLMs come with their own challenges and unique terminology. We seek to identify key models, establish reporting guidelines for replicability, and highlight progress in key application areas of LLM use in the neurosurgical literature.

METHODS

We searched PubMed and Google Scholar using terms related to LLMs and neurosurgery ("large language model" OR "LLM" OR "ChatGPT" OR "GPT-3" OR "GPT3" OR "GPT-3.5" OR "GPT3.5" OR "GPT-4" OR "GPT4" OR "LLAMA" OR "MISTRAL" OR "BARD") AND "neurosurgery". The final set of articles was reviewed for publication year, application area, specific LLM(s) used, control/comparison groups used to evaluate LLM performance, whether the article reported specific LLM prompts, prompting strategy types used, whether the LLM query could be reproduced in its entirety (including both the prompt used and any adjoining data), measures of hallucination, and reported performance measures.

RESULTS

Fifty-one articles met inclusion criteria, and were categorized into six application areas, with the most common being Generation of Text for Direct Clinical Use (n = 14, 27.5%), Answering Standardized Exam Questions (n = 12, 23.5%), and Clinical Judgement and Decision-Making Support (n = 11, 21.6%). The most frequently used LLMs were GPT-3.5 (n = 30, 58.8%), GPT-4 (n = 20, 39.2%), Bard (n = 9, 17.6%), and Bing (n = 6, 11.8%). Most studies (n = 43, 84.3%) used LLMs directly out-of-the-box, while 8 studies (15.7%) conducted advanced pre-training or fine-tuning.

CONCLUSIONS

Large language models show advanced capabilities in complex tasks and hold potential to transform neurosurgery. However, research typically addresses basic applications and overlooks enhancing LLM performance, facing reproducibility issues. Standardizing detailed reporting, considering LLM stochasticity, and using advanced methods beyond basic validation are essential for progress.

摘要

背景

大型语言模型(LLM)在神经外科领域受到越来越多的关注,具有显著提高该领域水平的潜力。然而,各种神经外科任务中 LLM 的广度和性能尚未得到系统的研究,而且 LLM 存在自身的挑战和独特的术语。我们旨在确定关键模型,为可重复性制定报告指南,并突出 LLM 在神经外科文献中的关键应用领域的进展。

方法

我们使用与 LLM 和神经外科相关的术语(“大型语言模型”或“LLM”或“ChatGPT”或“GPT-3”或“GPT3”或“GPT-3.5”或“GPT3.5”或“GPT-4”或“GPT4”或“LLAMA”或“MISTRAL”或“BARD”)以及“神经外科”在 PubMed 和 Google Scholar 上进行了检索。最后一组文章根据出版年份、应用领域、使用的特定 LLM、用于评估 LLM 性能的对照/比较组、是否报告特定的 LLM 提示、使用的提示策略类型、是否可以完整复制 LLM 查询(包括使用的提示和任何附加数据)、幻觉的测量以及报告的性能测量进行了审查。

结果

符合纳入标准的文章有 51 篇,分为六个应用领域,最常见的是直接用于临床的文本生成(n=14,27.5%)、回答标准化考试问题(n=12,23.5%)和临床判断与决策支持(n=11,21.6%)。使用最频繁的 LLM 是 GPT-3.5(n=30,58.8%)、GPT-4(n=20,39.2%)、Bard(n=9,17.6%)和 Bing(n=6,11.8%)。大多数研究(n=43,84.3%)直接使用现成的 LLM,而 8 项研究(15.7%)进行了高级预训练或微调。

结论

大型语言模型在复杂任务中表现出先进的能力,有潜力改变神经外科。然而,研究通常只涉及基本应用,而忽略了提高 LLM 性能,面临着可重复性问题。标准化详细报告、考虑 LLM 的随机性以及使用基本验证之外的先进方法对于取得进展至关重要。

相似文献

1
Large language models in neurosurgery: a systematic review and meta-analysis.神经外科中的大语言模型:系统评价和荟萃分析。
Acta Neurochir (Wien). 2024 Nov 23;166(1):475. doi: 10.1007/s00701-024-06372-9.
2
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
3
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
4
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
5
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.
6
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类:信息流行病学研究
J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.
7
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
8
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
9
Large Language Model Architectures in Health Care: Scoping Review of Research Perspectives.医疗保健中的大语言模型架构:研究视角的范围综述
J Med Internet Res. 2025 Jun 19;27:e70315. doi: 10.2196/70315.
10
Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.利用大型语言模型从用户生成的日记文本数据中检测抑郁,作为数字心理健康筛查的新方法:仪器验证研究。
J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.

引用本文的文献

1
Comparative performance of neurosurgery-specific, peer-reviewed versus general AI chatbots in bilingual board examinations: evaluating accuracy, consistency, and error minimization strategies.神经外科特定的、经过同行评审的人工智能聊天机器人与通用人工智能聊天机器人在双语资格考试中的比较表现:评估准确性、一致性和错误最小化策略。
Acta Neurochir (Wien). 2025 Sep 9;167(1):241. doi: 10.1007/s00701-025-06628-y.
2
Specialized AI and neurosurgeons in niche expertise: a proof-of-concept in neuromodulation with vagus nerve stimulation.专业人工智能与神经外科领域的细分专家:迷走神经刺激神经调节的概念验证
Acta Neurochir (Wien). 2025 Jul 25;167(1):203. doi: 10.1007/s00701-025-06610-8.
3

本文引用的文献

1
Evaluating the Adherence of Large Language Models to Surgical Guidelines: A Comparative Analysis of Chatbot Recommendations and North American Spine Society (NASS) Coverage Criteria.评估大型语言模型对手术指南的遵循情况:聊天机器人推荐与北美脊柱学会(NASS)覆盖标准的对比分析
Cureus. 2024 Sep 3;16(9):e68521. doi: 10.7759/cureus.68521. eCollection 2024 Sep.
2
Text-to-video generative artificial intelligence: sora in neurosurgery.文本到视频生成式人工智能:神经外科中的晓。
Neurosurg Rev. 2024 Jun 13;47(1):272. doi: 10.1007/s10143-024-02514-w.
3
Enhancing Diagnostic Support for Chiari Malformation and Syringomyelia: A Comparative Study of Contextualized ChatGPT Models.
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.
大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
4
Employing large language models safely and effectively as a practicing neurosurgeon.作为一名执业神经外科医生,安全有效地使用大语言模型。
Acta Neurochir (Wien). 2025 Apr 9;167(1):101. doi: 10.1007/s00701-025-06515-6.
5
Evaluating Large Language Models for Automated CPT Code Prediction in Endovascular Neurosurgery.评估大语言模型用于血管内神经外科手术中自动预测CPT代码
J Med Syst. 2025 Jan 24;49(1):15. doi: 10.1007/s10916-025-02149-4.
增强对 Chiari 畸形和脊髓空洞症的诊断支持:语境化 ChatGPT 模型的比较研究。
World Neurosurg. 2024 Sep;189:e86-e107. doi: 10.1016/j.wneu.2024.05.172. Epub 2024 Jun 1.
4
Accuracy of ChatGPT in Neurolocalization.ChatGPT在神经定位方面的准确性。
Cureus. 2024 Apr 27;16(4):e59143. doi: 10.7759/cureus.59143. eCollection 2024 Apr.
5
Educational Limitations of ChatGPT in Neurosurgery Board Preparation.ChatGPT在神经外科专科医师考试备考中的教育局限性
Cureus. 2024 Apr 20;16(4):e58639. doi: 10.7759/cureus.58639. eCollection 2024 Apr.
6
Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons?人工智能能否通过生成神经外科医生的鉴别诊断来减少漏诊?
World Neurosurg. 2024 Jul;187:e1083-e1088. doi: 10.1016/j.wneu.2024.05.052. Epub 2024 May 16.
7
Recent Outcomes and Challenges of Artificial Intelligence, Machine Learning, and Deep Learning in Neurosurgery.人工智能、机器学习和深度学习在神经外科领域的近期成果与挑战
World Neurosurg X. 2024 Mar 8;23:100301. doi: 10.1016/j.wnsx.2024.100301. eCollection 2024 Jul.
8
Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.人工智能能否通过欧洲神经外科书面考试?——伦理与实际问题。
Brain Spine. 2024 Feb 13;4:102765. doi: 10.1016/j.bas.2024.102765. eCollection 2024.
9
Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases.Chat-GPT 与脑肿瘤:人工智能/机器学习提供神经肿瘤学等案例诊断和治疗方案的能力评估。
Clin Neurol Neurosurg. 2024 Apr;239:108238. doi: 10.1016/j.clineuro.2024.108238. Epub 2024 Mar 9.
10
Large language models assisted multi-effect variants mining on cerebral cavernous malformation familial whole genome sequencing.大语言模型辅助的脑海绵状血管畸形家族全基因组测序中的多效应变异挖掘
Comput Struct Biotechnol J. 2024 Feb 1;23:843-858. doi: 10.1016/j.csbj.2024.01.014. eCollection 2024 Dec.