大语言模型是解决患者关于拇外翻常见担忧的有用资源吗？一项可读性分析。

Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis.

作者信息

Hlavinka William J, Sontam Tarun R, Gupta Anuj, Croen Brett J, Abdullah Mohammed S, Humbyrd Casey J

机构信息

Texas A&M School of Medicine, Baylor University Medical Center, Department of Medical Education, 3500 Gaston Avenue, 6-Roberts, Dallas, TX 75246, USA.

Department of Orthopedic Surgery, University of Pennsylvania Health System, 51 N 39th St, Philadelphia, PA 19104, USA.

出版信息

Foot Ankle Surg. 2025 Jan;31(1):15-19. doi: 10.1016/j.fas.2024.08.002. Epub 2024 Aug 6.

DOI:10.1016/j.fas.2024.08.002

PMID:39117535

Abstract

BACKGROUND

This study evaluates the accuracy and readability of Google, ChatGPT-3.5, and 4.0 (two versions of an artificial intelligence model) responses to common questions regarding bunion surgery.

METHODS

A Google search of "bunionectomy" was performed, and the first ten questions under "People Also Ask" were recorded. ChatGPT-3.5 and 4.0 were asked these ten questions individually, and their answers were analyzed using the Flesch-Kincaid Reading Ease and Gunning-Fog Level algorithms.

RESULTS

When compared to Google, ChatGPT-3.5 and 4.0 had a larger word count with 315 ± 39 words (p < .0001) and 294 ± 39 words (p < .0001), respectively. A significant difference was found between ChatGPT-3.5 and 4.0 compared to Google using Flesch-Kincaid Reading Ease (p < .0001).

CONCLUSIONS

Our findings demonstrate that ChatGPT provided significantly lengthier responses than Google and there was a significant difference in reading ease. Both platforms exceeded the seventh to eighth-grade reading level of the U.S.

LEVEL OF EVIDENCE

N/A.

摘要

背景

本研究评估了谷歌、ChatGPT-3.5和4.0（人工智能模型的两个版本）对有关拇囊炎手术常见问题的回答的准确性和可读性。

方法

在谷歌上搜索“拇囊切除术”，记录“相关问题”下的前十个问题。分别向ChatGPT-3.5和4.0提出这十个问题，并使用弗莱什-金凯德易读性算法和冈宁-福格指数算法分析它们的答案。

结果

与谷歌相比，ChatGPT-3.5和4.0的单词数更多，分别为315±39个单词（p<.0001）和294±39个单词（p<.0001）。使用弗莱什-金凯德易读性算法发现，与谷歌相比，ChatGPT-3.5和4.0之间存在显著差异（p<.0001）。

结论

我们的研究结果表明，ChatGPT提供的回答比谷歌长得多，并且在易读性方面存在显著差异。两个平台都超过了美国七年级到八年级的阅读水平。

证据水平

无。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大语言模型是解决患者关于拇外翻常见担忧的有用资源吗？一项可读性分析。

Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

LEVEL OF EVIDENCE

背景

方法

结果

结论

证据水平

相似文献

引用本文的文献

大语言模型是解决患者关于拇外翻常见担忧的有用资源吗？一项可读性分析。

Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

LEVEL OF EVIDENCE

背景

方法

结果

结论

证据水平

相似文献

引用本文的文献