比较不同版本的ChatGPT在告知肩袖损伤患者方面的情况。

Comparison of ChatGPT versions in informing patients with rotator cuff injuries.

作者信息

Günay Ali Eray, Özer Alper, Yazıcı Alparslan, Sayer Gökhan

机构信息

Department of Orthopedics and Traumatology, Kayseri City Training and Research Hospital, Kayseri, Turkey.

Department of Orthopedics and Traumatology, Develi State Hospital, Kayseri, Turkey.

出版信息

JSES Int. 2024 May 6;8(5):1016-1018. doi: 10.1016/j.jseint.2024.04.016. eCollection 2024 Sep.

Abstract

BACKGROUND

The aim of this study is to evaluate whether Chat Generative Pretrained Transformer (ChatGPT) can be recommended as a resource for informing patients planning rotator cuff repairs, and to assess the differences between ChatGPT 3.5 and 4.0 versions in terms of information content and readability.

METHODS

In August 2023, 13 commonly asked questions by patients with rotator cuff disease were posed to ChatGPT 3.5 and ChatGPT 4 programs using different internet protocol computers by 3 experienced surgeons in rotator cuff surgery. After converting the answers of both versions into text, the quality and readability of the answers were examined.

RESULTS

The average Journal of the American Medical Association score for both versions was 0, and the average DISCERN score was 61.6. A statistically significant and strong correlation was found between ChatGPT 3.5 and 4.0 DISCERN scores. There was excellent agreement in DISCERN scores for both versions among the 3 evaluators. ChatGPT 3.5 was found to be less readable than ChatGPT 4.0.

CONCLUSION

The information provided by the ChatGPT conversational system was evaluated as of high quality, but there were significant shortcomings in terms of reliability due to the lack of citations. Despite the ChatGPT 4.0 version having higher readability scores, both versions were considered difficult to read.

摘要

背景

本研究的目的是评估是否可以推荐聊天生成预训练变换器(ChatGPT)作为一种资源,为计划进行肩袖修复的患者提供信息,并评估ChatGPT 3.5和4.0版本在信息内容和可读性方面的差异。

方法

2023年8月,3位肩袖手术经验丰富的外科医生使用不同的互联网协议计算机,向ChatGPT 3.5和ChatGPT 4程序提出了13个肩袖疾病患者常见的问题。将两个版本的答案转换为文本后,对答案的质量和可读性进行了检查。

结果

两个版本的平均美国医学会杂志评分均为0,平均DISCERN评分为61.6。发现ChatGPT 3.5和4.0的DISCERN评分之间存在统计学上显著且强烈的相关性。3位评估者对两个版本的DISCERN评分具有高度一致性。发现ChatGPT 3.5的可读性低于ChatGPT 4.0。

结论

ChatGPT对话系统提供的信息被评估为高质量,但由于缺乏引用,在可靠性方面存在重大缺陷。尽管ChatGPT 4.0版本的可读性得分较高,但两个版本都被认为难以阅读。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索