评估ChatGPT-4o对圆锥角膜患者相关问题回答的准确性和可读性。

Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.

作者信息

Balci Ali Safa, Çakmak Semih

机构信息

Department of Ophthalmology,Sehit Prof. Dr. Ilhan Varank Sancaktepe Training and Research Hospital, Sehit Prof. Dr. Ilhan Varank Sancaktepe Training and Research Hospita University of Health Sciences, Istanbul, Türkiye.

Department of Ophthalmology,Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye.

出版信息

Ophthalmic Epidemiol. 2025 Mar 28:1-6. doi: 10.1080/09286586.2025.2484760.

DOI:10.1080/09286586.2025.2484760

PMID:40154955

Abstract

PURPOSE

This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.

METHODS

A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.

RESULTS

The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories ( = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.

CONCLUSION

ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.

摘要

目的

本研究旨在评估先进的大语言模型ChatGPT-4o对圆锥角膜患者常见的以患者为中心问题所给出回答的准确性和可读性。

方法

采用横断面观察性研究，使用ChatGPT-4o回答圆锥角膜患者可能提出的30个潜在问题。由两名获得委员会认证的眼科医生评估回答的准确性，并按1至5分进行评分。使用简化的难解词汇测量法（SMOG）、弗莱施-金凯德年级水平（FKGL）和弗莱施阅读简易度（FRE）分数评估可读性。对描述性、治疗相关和随访相关问题进行分析，并对这些类别之间进行统计学比较。

结果

回答的平均准确性评分为4.48±0.57（5分制李克特量表）。组内相关系数为0.769的评分者间信度表明一致性程度较高。可读性分数显示，SMOG评分为15.49±1.74，FKGL评分为14.95±1.95，FRE评分为27.41±9.71，表明需要较高的教育水平才能理解这些回答。不同问题类别之间的准确性没有显著差异（=0.161），但可读性差异显著，治疗相关问题最容易理解。

结论

ChatGPT-4o对以患者为中心的圆锥角膜问题提供了高度准确的回答，但其语言的复杂性可能会限制普通人群的可及性。需要进一步发展以提高人工智能生成的医学内容的可读性。

相似文献

Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.评估ChatGPT-4o对圆锥角膜患者相关问题回答的准确性和可读性。

Ophthalmic Epidemiol. 2025 Mar 28:1-6. doi: 10.1080/09286586.2025.2484760.

ChatGPT-4o's performance on pediatric Vesicoureteral reflux.ChatGPT-4o在小儿膀胱输尿管反流方面的表现。

J Pediatr Urol. 2025 Apr;21(2):504-509. doi: 10.1016/j.jpurol.2024.12.002. Epub 2024 Dec 7.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。

J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.

Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing.ChatGPT在介入放射学程序潜在并发症方面的准确性和可读性：人工智能驱动的患者访谈。

Acad Radiol. 2025 Mar;32(3):1547-1553. doi: 10.1016/j.acra.2024.10.028. Epub 2024 Nov 16.

Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量：一项比较评估。

Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.

Is ChatGPT a Reliable Source of Patient Information on Asthma?ChatGPT是哮喘患者信息的可靠来源吗？

Cureus. 2024 Jul 8;16(7):e64114. doi: 10.7759/cureus.64114. eCollection 2024 Jul.

The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.ChatGPT-4和必应聊天在青光眼常见问题方面的表现。

Eur J Ophthalmol. 2025 Jul;35(4):1323-1328. doi: 10.1177/11206721251321197. Epub 2025 Feb 19.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性：一项观察性横断面研究。

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery.评估 ChatGPT 在提供腺样体切除术、扁桃体切除术和通气管插入手术的家长指导方面的准确性和可读性。

Int J Pediatr Otorhinolaryngol. 2024 Jun;181:111998. doi: 10.1016/j.ijporl.2024.111998. Epub 2024 May 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT-4o对圆锥角膜患者相关问题回答的准确性和可读性。

Evaluating the Accuracy and Readability of ChatGPT-4o's Responses to Patient-Based Questions about Keratoconus.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献