评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用：与国际共识指南的对比研究

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.

作者信息

Deng Jiayi, Qiu Xu, Dong Chengqi, Xu Li, Dong Xiaoxue, Yang Shiyue, Li Qinghua, Mei Tao, Chen Shi, Wu Yali, Sun Jianliang, He Feifang, Wang Hanbin, Yu Liang

机构信息

The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China.

Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China.

出版信息

BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.

DOI:10.1186/s12883-025-04280-8

PMID:40597769

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12211737/

Abstract

OBJECTIVE

To evaluate the use of ChatGPT and DeepSeek in clinical practice to provide healthcare professionals with accurate information on the prevention, diagnosis, and management of post-dural puncture headache (PDPH), in particular to evaluate ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek with Deep Think(R1)'s responses with consensus practice guidelines for headache after dural puncture.

BACKGROUND

Post-dural puncture headache (PDPH) is a common complication of dural puncture. Currently, there is a lack of evidence-based guidance on the prevention, diagnosis and management of PDPH. The 2023 Consensus guidelines provide comprehensive information. With the development and popularization of AI, more and more people are using ai models, including patients and doctors. However, the quality of the answers provided by ai has not yet been tested.

METHODS

Responses from ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3, and DeepSeek-R1 were evaluated against PDPH guidelines using four dimensions: Accuracy (guideline adherence), Overconclusiveness (unjustified recommendations), Supplementary information (additional relevant details), and Incompleteness (omission of critical guidelines). A 5-point Likert scale further assessed response accuracy and completeness.

RESULTS

All four models show high accuracy and completeness.Of the 10 clinical guidelines evaluated,ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek-R1 all showed 100% accuracy in responses (10/10)(p = 1). None of the four models showed overly conclusive results(p = 1). In terms of supplementary information, ChatGPT-4o,ChatGPT-4o mini and DeepSeek-R1 are 100% (10/10), DeepSeek-V3 is 90% (9/10)(p = 1). In terms of incompleteness, ChatGPT-4o is 80%(8/10), DeepSeek-R1 is 70%(7/10), ChatGPT-4o mini and DeepSeek-V3 are 60% (6/10) (p = 0.729).

CONCLUSION

All four AI models demonstrate clinical validity, with ChatGPT-4o and DeepSeek-R1 showing stronger guideline alignment. Though largely accurate, their responses achieve only 60-80% completeness relative to medical guidelines. Healthcare professionals must exercise caution when using AI tools and should critically evaluate outputs before clinical application. While promising, their partial guideline coverage requires careful human oversight. Further validation research is essential before these models can reliably support clinical decision-making for complex conditions like PDPH.

摘要

目的

评估ChatGPT和豆包在临床实践中的应用，为医疗保健专业人员提供关于硬膜穿刺后头痛（PDPH）预防、诊断和管理的准确信息，特别是评估ChatGPT-4o、ChatGPT-4o mini、豆包-V3和具有深度思考（R1）的豆包的回答与硬膜穿刺后头痛的共识实践指南的一致性。

背景

硬膜穿刺后头痛（PDPH）是硬膜穿刺的常见并发症。目前，关于PDPH的预防、诊断和管理缺乏循证指南。2023年的共识指南提供了全面信息。随着人工智能的发展和普及，越来越多的人在使用人工智能模型，包括患者和医生。然而，人工智能提供的答案质量尚未经过测试。

方法

使用四个维度根据PDPH指南评估ChatGPT-4o、ChatGPT-4o mini、豆包-V3和豆包-R1的回答：准确性（遵循指南程度）、过度确定性（不合理的建议）、补充信息（额外的相关细节）和不完整性（关键指南的遗漏）。采用5点李克特量表进一步评估回答的准确性和完整性。

结果

所有四个模型均显示出高准确性和完整性。在评估的10条临床指南中，ChatGPT-4o、ChatGPT-4o mini、豆包-V3和豆包-R1的回答准确率均为100%（10/10）（p = 1）。四个模型均未显示出过度确定性的结果（p = 1）。在补充信息方面，ChatGPT-4o、ChatGPT-4o mini和豆包-R1为100%（10/10），豆包-V3为90%（9/10）（p = 1）。在不完整性方面，ChatGPT-4o为80%（8/10），豆包-R1为70%（7/10），ChatGPT-

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/392b/12211737/9b0eb08a5cc4/12883_2025_4280_Fig1_HTML.jpg

相似文献

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用：与国际共识指南的对比研究

BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.

Needle gauge and tip designs for preventing post-dural puncture headache (PDPH).预防硬膜穿刺后头痛（PDPH）的针规和针尖设计。

Cochrane Database Syst Rev. 2017 Apr 7;4(4):CD010807. doi: 10.1002/14651858.CD010807.pub2.

Posture and fluids for preventing post-dural puncture headache.预防硬膜穿刺后头痛的体位与补液

Cochrane Database Syst Rev. 2016 Mar 7;3(3):CD009199. doi: 10.1002/14651858.CD009199.pub3.

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能：比较研究

J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

Performance analysis of large language models Chatgpt-4o, OpenAI O1, and OpenAI O3 mini in clinical treatment of pneumonia: a comparative study.大语言模型Chatgpt-4o、OpenAI O1和OpenAI O3 mini在肺炎临床治疗中的性能分析：一项对比研究。

Clin Exp Med. 2025 Jun 20;25(1):213. doi: 10.1007/s10238-025-01743-7.

Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.ChatGPT-4o和DeepSeek-3对复杂口腔病变的鉴别诊断性能：多模态成像与病例难度分析

Oral Dis. 2025 Jul 1. doi: 10.1111/odi.70007.

Drug therapy for treating post-dural puncture headache.治疗硬膜穿刺后头痛的药物疗法。

Cochrane Database Syst Rev. 2011 Aug 10(8):CD007887. doi: 10.1002/14651858.CD007887.pub2.

Evaluating AI-Generated Patient Education Guides: A Comparative Study of ChatGPT and Deepseek.评估人工智能生成的患者教育指南：ChatGPT与豆包的比较研究。需注意，原文中是ChatGPT和Deepseek，你提供的原文有误，我按照正确的Deepseek进行了翻译，若实际需求是其他，请告知。

Cureus. 2025 Jun 3;17(6):e85277. doi: 10.7759/cureus.85277. eCollection 2025 Jun.

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能：比较分析

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

本文引用的文献

A Full-Scaled Perspective of K-Ras4B Transformations Modulated by Oncogenic G12D and Phosphorylation in Phase-Segregated Membrane.关于在相分离膜中由致癌性G12D和磷酸化调节的K-Ras4B转化的全面视角。

J Chem Inf Model. 2025 Jul 14;65(13):7024-7038. doi: 10.1021/acs.jcim.5c00643. Epub 2025 Jun 27.

Application of Large Language Models in Drug-Induced Osteotoxicity Prediction.大语言模型在药物性骨毒性预测中的应用。

J Chem Inf Model. 2025 Apr 14;65(7):3370-3379. doi: 10.1021/acs.jcim.5c00275. Epub 2025 Mar 20.

DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier.医疗保健领域的DeepSeek：揭示新开源人工智能前沿的机遇与导向挑战

Cureus. 2025 Feb 18;17(2):e79221. doi: 10.7759/cureus.79221. eCollection 2025 Feb.

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。

Eur J Med Res. 2025 Jan 22;30(1):45. doi: 10.1186/s40001-025-02296-x.

Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study.不同人工智能聊天机器人提供营养建议的准确性、完整性、可重复性和一致性比较：一项探索性研究。

J Clin Med. 2024 Dec 20;13(24):7810. doi: 10.3390/jcm13247810.

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响：一项随机临床试验。

JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.

Systematic review: The use of large language models as medical chatbots in digestive diseases.系统评价：大语言模型在消化系统疾病中的医学聊天机器人应用。

Aliment Pharmacol Ther. 2024 Jul;60(2):144-166. doi: 10.1111/apt.18058. Epub 2024 May 27.

Commentary on "Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison".关于“使用ChatGPT确定伴神经根病的腰椎间盘突出症的临床和手术治疗：北美脊柱协会指南比较”的评论

Neurospine. 2024 Mar;21(1):159-161. doi: 10.14245/ns.2448248.124. Epub 2024 Mar 31.

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.使用ChatGPT确定伴神经根病的腰椎间盘突出症的临床和手术治疗：与北美脊柱协会指南的比较

Neurospine. 2024 Mar;21(1):149-158. doi: 10.14245/ns.2347052.526. Epub 2024 Jan 31.

Evidence-based clinical practice guidelines on postdural puncture headache: a consensus report from a multisociety international working group.基于证据的硬脊膜穿剌后头痛临床实践指南：多学会国际工作组的共识报告。

Reg Anesth Pain Med. 2024 Jul 8;49(7):471-501. doi: 10.1136/rapm-2023-104817.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用：与国际共识指南的对比研究

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.

作者信息

机构信息

出版信息

OBJECTIVE

BACKGROUND

METHODS

RESULTS

CONCLUSION

目的

背景

方法

结果

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献