探究大语言模型在屈光手术相关问题上的作用。

Investigating the role of large language models on questions about refractive surgery.

作者信息

Demir Suleyman

机构信息

Adana 5 Ocak State Hospital, Department of Ophthalmology, Adana, Turkey.

出版信息

Int J Med Inform. 2025 Mar;195:105787. doi: 10.1016/j.ijmedinf.2025.105787. Epub 2025 Jan 6.

DOI:10.1016/j.ijmedinf.2025.105787

PMID:39787660

Abstract

BACKGROUND

Large language models (LLMs) are becoming increasingly popular and are playing an important role in providing accurate clinical information to both patients and physicians. This study aimed to investigate the effectiveness of ChatGPT-4.0, Google Gemini, and Microsoft Copilot LLMs for responding to patient questions regarding refractive surgery.

METHODS

The LLMs' responses to 25 questions about refractive surgery, which are frequently asked by patients, were evaluated by two ophthalmologists using a 5-point Likert scale, with scores ranging from 1 to 5. Furthermore, the DISCERN scale was used to assess the reliability of the language models' responses, whereas the Flesch Reading Ease and Flesch-Kincaid Grade Level indices were used to evaluate readability.

RESULTS

Significant differences were found among all three LLMs in the Likert scores (p = 0.022). Pairwise comparisons revealed that ChatGPT-4.0's Likert score was significantly higher than that of Microsoft Copilot, while no significant difference was found when compared to Google Gemini (p = 0.005 and p = 0.087, respectively). In terms of reliability, ChatGPT-4.0 stood out, receiving the highest DISCERN scores among the three LLMs. However, in terms of readability, ChatGPT-4.0 received the lowest score.

CONCLUSIONS

ChatGPT-4.0's responses to inquiries regarding refractive surgery were more intricate for patients compared to other language models; however, the information provided was more dependable and accurate.

摘要

背景

大语言模型（LLMs）越来越受欢迎，在为患者和医生提供准确的临床信息方面发挥着重要作用。本研究旨在调查ChatGPT-4.0、谷歌Gemini和微软Copilot大语言模型在回答患者关于屈光手术问题方面的有效性。

方法

两名眼科医生使用5分制李克特量表（分数范围为1至5）对大语言模型对患者经常问到的25个关于屈光手术问题的回答进行评估。此外，使用DISCERN量表评估语言模型回答的可靠性，而使用弗莱什易读性和弗莱什-金凯德年级水平指数评估可读性。

结果

所有三个大语言模型的李克特得分存在显著差异（p = 0.022）。两两比较显示，ChatGPT-4.0的李克特得分显著高于微软Copilot，而与谷歌Gemini相比未发现显著差异（分别为p = 0.005和p = 0.087）。在可靠性方面，ChatGPT-4.0表现突出，在三个大语言模型中获得最高的DISCERN分数。然而，在可读性方面，ChatGPT-4.0得分最低。

结论

与其他语言模型相比，ChatGPT-4.0对屈光手术相关询问的回答对患者来说更为复杂；然而，所提供的信息更可靠、准确。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

探究大语言模型在屈光手术相关问题上的作用。

Investigating the role of large language models on questions about refractive surgery.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

探究大语言模型在屈光手术相关问题上的作用。

Investigating the role of large language models on questions about refractive surgery.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献