Suppr超能文献

丘脑底核或苍白球内侧部深部脑刺激治疗帕金森病:一种人工智能方法。

Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson's disease: An artificial intelligence approach.

作者信息

Shin David, Tang Timothy, Carson Joel, Isaac Rekha, Dinh Chandler, Im Daniel, Fay Andrew, Isaac Asael, Cho Stephen, Brandt Zachary, Nguyen Kai, Shaffrey Isabel, Yacoubian Vahe, Taka Taha M, Spellicy Samantha, Lopez-Gonzalez Miguel Angel, Danisa Olumide

机构信息

School of Medicine, Loma Linda University, Loma Linda, CA, USA.

School of Medicine, Duke University, Durham, NC, USA.

出版信息

J Clin Neurosci. 2025 Jun 18;138:111393. doi: 10.1016/j.jocn.2025.111393.

Abstract

BACKGROUND

Generative artificial intelligence (AI) in deep brain stimulation (DBS) is currently unvalidated in its content. This study sought to analyze AI responses to questions and recommendations from the 2018 Congress of Neurological Surgeons (CNS) guidelines on subthalamic nucleus and globus pallidus internus DBS for the treatment of patients with Parkinson's Disease.

METHODS

Seven questions were generated from CNS guidelines and asked to ChatGPT 4o, Perplexity, Copilot, and Gemini. Answers were "concordant" if they highlighted all points provided by the CNS guidelines; otherwise, answers were considered "non-concordant" and sub-categorized as either "insufficient" or "overconclusive." AI responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease tests.

RESULTS

ChatGPT 4o showcased 42.9% concordance, with non-concordant responses classified as 14.3% insufficient and 42.8% over-conclusive. Perplexity displayed a 28.6% concordance rate, with 14.3% insufficient and 57.1% over-conclusive responses. Copilot showed 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. Gemini demonstrated 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. The Flesch-Kincaid Grade Level scores ranged from 14.44 (Gemini) to 18.94 (Copilot), Gunning Fog Index scores varied between 17.9 (Gemini) and 22.06 (Copilot), SMOG Index scores ranged from 16.54 (Gemini) to 19.67 (Copilot), and all Flesch Reading Ease scores were low, with Gemini showing the highest score of 30.91.

CONCLUSION

ChatGPT 4o displayed the most concordance, Perplexity displayed the highest over-conclusive rate, and Copilot and Gemini showcased the most insufficient answers. All responses showcased complex readability. Despite the possible benefits of future developments and innovation in AI capabilities, AI requires further improvement before independent clinical usage in DBS.

摘要

背景

深部脑刺激(DBS)中的生成式人工智能(AI)目前在内容方面尚未得到验证。本研究旨在分析人工智能对2018年神经外科医师大会(CNS)关于丘脑底核和苍白球内侧部DBS治疗帕金森病患者的指南中的问题和建议的回答。

方法

从CNS指南中提出七个问题,并向ChatGPT 4o、Perplexity、Copilot和Gemini提问。如果回答突出了CNS指南提供的所有要点,则答案为“一致”;否则,答案被视为“不一致”,并细分为“不足”或“过度结论性”。通过弗莱什-金凯德年级水平、冈宁雾度指数、复杂程度简易测量(SMOG)指数和弗莱什阅读易读性测试对人工智能的回答进行可读性评估。

结果

ChatGPT 4o的一致性为42.9%,不一致的回答中14.3%为不足,42.8%为过度结论性。Perplexity的一致率为28.6%,不足的回答为14.3%,过度结论性的回答为57.1%。Copilot的一致性为28.6%,不足的回答为28.6%,过度结论性的回答为42.8%。Gemini的一致性为28.6%,不足的回答为28.6%,过度结论性的回答为42.8%。弗莱什-金凯德年级水平得分在14.44(Gemini)至18.94(Copilot)之间,冈宁雾度指数得分在17.9(Gemini)至22.06(Copilot)之间,SMOG指数得分在16.54(Gemini)至19.67(Copilot)之间,所有弗莱什阅读易读性得分都很低,Gemini的得分最高,为30.91。

结论

ChatGPT 4o的一致性最高,Perplexity的过度结论性率最高,Copilot和Gemini的回答不足最多。所有回答的可读性都很复杂。尽管人工智能能力的未来发展和创新可能带来好处,但在DBS中独立临床使用之前,人工智能还需要进一步改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验