Suppr超能文献

评估DeepResearch和DeepThink在前交叉韧带手术患者教育中的作用:ChatGPT-4o在全面性方面表现出色,DeepSeek R1在骨科信息的清晰度和可读性方面领先。

Evaluating DeepResearch and DeepThink in anterior cruciate ligament surgery patient education: ChatGPT-4o excels in comprehensiveness, DeepSeek R1 leads in clarity and readability of orthopaedic information.

作者信息

Gültekin Onur, Inoue Jumpei, Yilmaz Baris, Cerci Mehmet Halis, Kilinc Bekir Eray, Yilmaz Hüsnü, Prill Robert, Kayaalp Mahmut Enes

机构信息

Department of Orthopaedics and Traumatology, Istanbul Fatih Sultan Mehmet Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.

Department of Orthopaedic Surgery, Nagoya Tokushukai General Hospital, Kasugai, Aichi, Japan.

出版信息

Knee Surg Sports Traumatol Arthrosc. 2025 Jun 1. doi: 10.1002/ksa.12711.

Abstract

PURPOSE

This study compares ChatGPT-4o, equipped with its deep research feature, and DeepSeek R1, equipped with its deepthink feature-both enabling real-time online data access-in generating responses to frequently asked questions (FAQs) about anterior cruciate ligament (ACL) surgery. The aim is to evaluate and compare their performance in terms of accuracy, clarity, completeness, consistency and readibility for evidence-based patient education.

METHODS

A list of ten FAQs about ACL surgery was compiled after reviewing the Sports Medicine Fellowship Institution's webpages. These questions were posed to ChatGPT and DeepSeek in research-enabled modes. Orthopaedic sports surgeons evaluated the responses for accuracy, clarity, completeness, and consistency using a 4-point Likert scale. Inter-rater reliability of the evaluations was assessed using intraclass correlation coefficients (ICCs). In addition, a readability analysis was conducted using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES) metrics via an established online calculator to objectively measure textual complexity. Paired t tests were used to compare the mean scores of the two models for each criterion, with significance set at p < 0.05.

RESULTS

Both models demonstrated high accuracy (mean scores of 3.9/4) and consistency (4/4). Significant differences were observed in clarity and completeness: ChatGPT provided more comprehensive responses (mean completeness 4.0 vs. 3.2, p < 0.001), while DeepSeek's answers were clearer and more accessible to laypersons (mean clarity 3.9 vs. 3.0, p < 0.001). DeepSeek had lower FKGL (8.9 vs. 14.2, p < 0.001) and higher FRES (61.3 vs. 32.7, p < 0.001), indicating greater ease of reading for a general audience. ICC analysis indicated substantial inter-rater agreement (composite ICC = 0.80).

CONCLUSION

ChatGPT-4o, leveraging its deep research feature, and DeepSeek R1, utilizing its deepthink feature, both deliver high-quality, accurate information for ACL surgery patient education. While ChatGPT excels in comprehensiveness, DeepSeek outperforms in clarity and readability, suggesting that integrating the strengths of both models could optimize patient education outcomes.

LEVEL OF EVIDENCE

Level V.

摘要

目的

本研究比较了具备深度研究功能的ChatGPT-4o和具备深度思考功能(均支持实时在线数据访问)的DeepSeek R1在生成关于前交叉韧带(ACL)手术常见问题(FAQ)的回答方面的表现。目的是评估和比较它们在基于证据的患者教育方面的准确性、清晰度、完整性、一致性和可读性。

方法

在查阅运动医学 fellowship 机构的网页后,编制了一份关于ACL手术的十个常见问题列表。这些问题以研究启用模式向ChatGPT和DeepSeek提出。骨科运动外科医生使用4点李克特量表评估回答的准确性、清晰度、完整性和一致性。使用组内相关系数(ICC)评估评估者间的可靠性。此外,通过一个既定的在线计算器,使用弗莱什-金凯德年级水平(FKGL)和弗莱什阅读简易度得分(FRES)指标进行可读性分析,以客观测量文本复杂性。使用配对t检验比较两个模型在每个标准上的平均得分,显著性设定为p < 0.05。

结果

两个模型都表现出高准确性(平均得分3.9/4)和一致性(4/4)。在清晰度和完整性方面观察到显著差异:ChatGPT提供了更全面的回答(平均完整性4.0对3.2,p < 0.001),而DeepSeek的回答对非专业人士来说更清晰、更容易理解(平均清晰度3.9对3.0,p < 0.001)。DeepSeek的FKGL较低(8.9对14.2,p < 0.001),FRES较高(61.3对32.7,p < 0.001),表明对普通读者来说阅读更容易。ICC分析表明评估者间有实质性的一致性(综合ICC = 0.80)。

结论

利用其深度研究功能的ChatGPT-4o和利用其深度思考功能的DeepSeek R1都为ACL手术患者教育提供了高质量、准确的信息。虽然ChatGPT在全面性方面表现出色,但DeepSeek在清晰度和可读性方面表现更优,这表明整合两个模型的优势可以优化患者教育效果。

证据水平

V级。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验