踝关节扭伤的诊断、治疗与预防：将免费聊天机器人推荐与临床指南进行比较

Diagnosis, treatment, and prevention of ankle sprains: Comparing free chatbot recommendations with clinical guidelines.

作者信息

Roch Friederike Eva, Hahn Franziska Melanie, Jäckle Katharina, Meier Marc-Pascal, Stinus Hartmut, Lehmann Wolfgang, Perthel Ronny, Roch Paul Jonathan

机构信息

Department of Trauma Surgery, Orthopaedics and Plastic Surgery, University of Göttingen, Robert-Koch-Str. 40, Göttingen 37075, Germany.

出版信息

Foot Ankle Surg. 2025 Jun;31(4):329-351. doi: 10.1016/j.fas.2024.12.003. Epub 2024 Dec 13.

DOI:10.1016/j.fas.2024.12.003

PMID:39730224

Abstract

BACKGROUND

Free chatbots powered by large language models offer lateral ankle sprains (LAS) treatment recommendations but lack scientific validation.

METHODS

The chatbots-Claude, Perplexity, and ChatGPT-were evaluated by comparing their responses to a questionnaire and their treatment algorithms against current clinical guidelines. Responses were graded on accuracy, conclusiveness, supplementary information, and incompleteness, and evaluated individually and collectively, with a 60 % pass threshold.

RESULTS

The collective analysis of the questionnaire showed Perplexity scored significantly higher than Claude and ChatGPT (p < 0.001). In the individual analysis, Perplexity provided significantly more supplementary information than the other chatbots (p < 0.001). All chatbots met the pass threshold. In the algorithm evaluation, ChatGPT scored significantly higher than the others (p = 0.023), with Perplexity below the pass threshold.

CONCLUSIONS

Chatbots' recommendations generally aligned with current guidelines but sometimes missed crucial details. While they offer useful supplementary information, they cannot yet replace professional medical consultation or established guidelines.

摘要

背景

由大语言模型驱动的免费聊天机器人提供外侧踝关节扭伤（LAS）的治疗建议，但缺乏科学验证。

方法

通过将聊天机器人Claude、Perplexity和ChatGPT对问卷的回答及其治疗算法与当前临床指南进行比较来评估它们。回答根据准确性、结论性、补充信息和不完整性进行评分，并分别和综合进行评估，及格阈值为60%。

结果

问卷的综合分析显示，Perplexity的得分显著高于Claude和ChatGPT（p<0.001）。在个体分析中，Perplexity提供的补充信息显著多于其他聊天机器人（p<0.001）。所有聊天机器人均达到及格阈值。在算法评估中，ChatGPT的得分显著高于其他机器人（p=0.023），Perplexity低于及格阈值。

结论

聊天机器人的建议总体上与当前指南一致，但有时会遗漏关键细节。虽然它们提供了有用的补充信息，但尚不能取代专业医疗咨询或既定指南。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

踝关节扭伤的诊断、治疗与预防：将免费聊天机器人推荐与临床指南进行比较

Diagnosis, treatment, and prevention of ankle sprains: Comparing free chatbot recommendations with clinical guidelines.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

踝关节扭伤的诊断、治疗与预防：将免费聊天机器人推荐与临床指南进行比较

Diagnosis, treatment, and prevention of ankle sprains: Comparing free chatbot recommendations with clinical guidelines.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献