Balci Ali Safa, Çakmak Semih
Department of Ophthalmology,Sehit Prof. Dr. Ilhan Varank Sancaktepe Training and Research Hospital, Sehit Prof. Dr. Ilhan Varank Sancaktepe Training and Research Hospita University of Health Sciences, Istanbul, Türkiye.
Department of Ophthalmology,Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye.
Ophthalmic Epidemiol. 2025 Mar 28:1-6. doi: 10.1080/09286586.2025.2484760.
This study aimed to evaluate the accuracy and readability of responses generated by ChatGPT-4o, an advanced large language model, to frequently asked patient-centered questions about keratoconus.
A cross-sectional, observational study was conducted using ChatGPT-4o to answer 30 potential questions that could be asked by patients with keratoconus. The accuracy of the responses was evaluated by two board-certified ophthalmologists and scored on a scale of 1 to 5. Readability was assessed using the Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL), and Flesch Reading Ease (FRE) scores. Descriptive, treatment-related, and follow-up-related questions were analyzed, and statistical comparisons between these categories were performed.
The mean accuracy score for the responses was 4.48 ± 0.57 on a 5-point Likert scale. The interrater reliability, with an intraclass correlation coefficient of 0.769, indicated a strong level of agreement. Readability scores revealed a SMOG score of 15.49 ± 1.74, an FKGL score of 14.95 ± 1.95, and an FRE score of 27.41 ± 9.71, indicating that a high level of education is required to comprehend the responses. There was no significant difference in accuracy among the different question categories ( = 0.161), but readability varied significantly, with treatment-related questions being the easiest to understand.
ChatGPT-4o provides highly accurate responses to patient-centered questions about keratoconus, though the complexity of its language may limit accessibility for the general population. Further development is needed to enhance the readability of AI-generated medical content.
本研究旨在评估先进的大语言模型ChatGPT-4o对圆锥角膜患者常见的以患者为中心问题所给出回答的准确性和可读性。
采用横断面观察性研究,使用ChatGPT-4o回答圆锥角膜患者可能提出的30个潜在问题。由两名获得委员会认证的眼科医生评估回答的准确性,并按1至5分进行评分。使用简化的难解词汇测量法(SMOG)、弗莱施-金凯德年级水平(FKGL)和弗莱施阅读简易度(FRE)分数评估可读性。对描述性、治疗相关和随访相关问题进行分析,并对这些类别之间进行统计学比较。
回答的平均准确性评分为4.48±0.57(5分制李克特量表)。组内相关系数为0.769的评分者间信度表明一致性程度较高。可读性分数显示,SMOG评分为15.49±1.74,FKGL评分为14.95±1.95,FRE评分为27.41±9.71,表明需要较高的教育水平才能理解这些回答。不同问题类别之间的准确性没有显著差异(=0.161),但可读性差异显著,治疗相关问题最容易理解。
ChatGPT-4o对以患者为中心的圆锥角膜问题提供了高度准确的回答,但其语言的复杂性可能会限制普通人群的可及性。需要进一步发展以提高人工智能生成的医学内容的可读性。