Ahmed H Shafeeq, Thrishulamurthy Chinmayee J
Department of Ophthalmology, Bangalore Medical College and Research Institute, Bangalore, India.
Eur J Ophthalmol. 2025 Mar;35(2):466-473. doi: 10.1177/11206721241272251. Epub 2024 Aug 7.
The rise in popularity of chatbots, particularly ChatGPT by OpenAI among the general public and its utility in the healthcare field is a topic of present controversy. The current study aimed at assessing the reliability and accuracy of ChatGPT's responses to inquiries posed by parents, specifically focusing on a range of pediatric ophthalmological and strabismus conditions.
Patient queries were collected via a thematic analysis and posed to ChatGPT 3.5 version across 3 unique instances each. The questions were divided into 12 domains totalling 817 unique questions. All responses were scored on the response quality by two experienced pediatric ophthalmologists in a Likert-scale format. All questions were evaluated for readability using the Flesch-Kincaid Grade Level (FKGL) and character counts.
A total of 638 (78.09%) questions were scored to be perfectly correct, 156 (19.09%) were scored correct but incomplete and only 23 (2.81%) were scored to be partially incorrect. None of the responses were scored to be completely incorrect. Average FKGL score was 14.49 [95% CI 14.4004-14.5854] and the average character count was 1825.33 [95%CI 1791.95-1858.7] with p = 0.831 and 0.697 respectively. The minimum and maximum FKGL scores were 10.6 and 18.34 respectively. FKGL predicted character count, R²=.012, F(1,815) = 10.26, p = .001.
ChatGPT provided accurate and reliable information for a majority of the questions. The readability of the questions was much above the typically required standards for adults, which is concerning. Despite these limitations, it is evident that this technology will play a significant role in the healthcare industry.
聊天机器人的日益普及,尤其是OpenAI的ChatGPT在普通大众中的受欢迎程度及其在医疗保健领域的效用,是当前一个有争议的话题。当前的研究旨在评估ChatGPT对家长提出的问题的回答的可靠性和准确性,特别关注一系列小儿眼科和斜视病症。
通过主题分析收集患者的问题,并分3个独立实例向ChatGPT 3.5版本提出。这些问题被分为12个领域,共计817个独特问题。两位经验丰富的小儿眼科医生以李克特量表的形式对所有回答的质量进行评分。所有问题均使用弗莱什-金凯德年级水平(FKGL)和字符数进行可读性评估。
共有638个(78.09%)问题被评为完全正确,156个(19.09%)被评为正确但不完整,只有23个(2.81%)被评为部分错误。没有一个回答被评为完全错误。平均FKGL分数为14.49 [95%置信区间14.4004 - 14.5854],平均字符数为1825.33 [95%置信区间1791.95 - 1858.7],p值分别为0.831和0.697。FKGL的最低和最高分数分别为10.6和18.34。FKGL预测字符数,R² = 0.012,F(1,815) = 10.26,p = 0.001。
ChatGPT为大多数问题提供了准确可靠的信息。问题的可读性远高于成年人通常所需的标准,这令人担忧。尽管存在这些局限性,但很明显这项技术将在医疗行业发挥重要作用。