Suppr超能文献

评估ChatGPT和DeepSeek在硬膜穿刺后头痛管理中的应用:与国际共识指南的对比研究

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines.

作者信息

Deng Jiayi, Qiu Xu, Dong Chengqi, Xu Li, Dong Xiaoxue, Yang Shiyue, Li Qinghua, Mei Tao, Chen Shi, Wu Yali, Sun Jianliang, He Feifang, Wang Hanbin, Yu Liang

机构信息

The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University,Hangzhou First People's Hospital, Hangzhou, China.

Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China.

出版信息

BMC Neurol. 2025 Jul 1;25(1):264. doi: 10.1186/s12883-025-04280-8.

Abstract

OBJECTIVE

To evaluate the use of ChatGPT and DeepSeek in clinical practice to provide healthcare professionals with accurate information on the prevention, diagnosis, and management of post-dural puncture headache (PDPH), in particular to evaluate ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek with Deep Think(R1)'s responses with consensus practice guidelines for headache after dural puncture.

BACKGROUND

Post-dural puncture headache (PDPH) is a common complication of dural puncture. Currently, there is a lack of evidence-based guidance on the prevention, diagnosis and management of PDPH. The 2023 Consensus guidelines provide comprehensive information. With the development and popularization of AI, more and more people are using ai models, including patients and doctors. However, the quality of the answers provided by ai has not yet been tested.

METHODS

Responses from ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3, and DeepSeek-R1 were evaluated against PDPH guidelines using four dimensions: Accuracy (guideline adherence), Overconclusiveness (unjustified recommendations), Supplementary information (additional relevant details), and Incompleteness (omission of critical guidelines). A 5-point Likert scale further assessed response accuracy and completeness.

RESULTS

All four models show high accuracy and completeness.Of the 10 clinical guidelines evaluated,ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek-R1 all showed 100% accuracy in responses (10/10)(p = 1). None of the four models showed overly conclusive results(p = 1). In terms of supplementary information, ChatGPT-4o,ChatGPT-4o mini and DeepSeek-R1 are 100% (10/10), DeepSeek-V3 is 90% (9/10)(p = 1). In terms of incompleteness, ChatGPT-4o is 80%(8/10), DeepSeek-R1 is 70%(7/10), ChatGPT-4o mini and DeepSeek-V3 are 60% (6/10) (p = 0.729).

CONCLUSION

All four AI models demonstrate clinical validity, with ChatGPT-4o and DeepSeek-R1 showing stronger guideline alignment. Though largely accurate, their responses achieve only 60-80% completeness relative to medical guidelines. Healthcare professionals must exercise caution when using AI tools and should critically evaluate outputs before clinical application. While promising, their partial guideline coverage requires careful human oversight. Further validation research is essential before these models can reliably support clinical decision-making for complex conditions like PDPH.

摘要

目的

评估ChatGPT和豆包在临床实践中的应用,为医疗保健专业人员提供关于硬膜穿刺后头痛(PDPH)预防、诊断和管理的准确信息,特别是评估ChatGPT-4o、ChatGPT-4o mini、豆包-V3和具有深度思考(R1)的豆包的回答与硬膜穿刺后头痛的共识实践指南的一致性。

背景

硬膜穿刺后头痛(PDPH)是硬膜穿刺的常见并发症。目前,关于PDPH的预防、诊断和管理缺乏循证指南。2023年的共识指南提供了全面信息。随着人工智能的发展和普及,越来越多的人在使用人工智能模型,包括患者和医生。然而,人工智能提供的答案质量尚未经过测试。

方法

使用四个维度根据PDPH指南评估ChatGPT-4o、ChatGPT-4o mini、豆包-V3和豆包-R1的回答:准确性(遵循指南程度)、过度确定性(不合理的建议)、补充信息(额外的相关细节)和不完整性(关键指南的遗漏)。采用5点李克特量表进一步评估回答的准确性和完整性。

结果

所有四个模型均显示出高准确性和完整性。在评估的10条临床指南中,ChatGPT-4o、ChatGPT-4o mini、豆包-V3和豆包-R1的回答准确率均为100%(10/10)(p = 1)。四个模型均未显示出过度确定性的结果(p = 1)。在补充信息方面,ChatGPT-4o、ChatGPT-4o mini和豆包-R1为100%(10/10),豆包-V3为90%(9/10)(p = 1)。在不完整性方面,ChatGPT-4o为80%(8/10),豆包-R1为70%(7/10),ChatGPT-

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/392b/12211737/9b0eb08a5cc4/12883_2025_4280_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验