ChatGPT-4o在小儿膀胱输尿管反流方面的表现。

ChatGPT-4o's performance on pediatric Vesicoureteral reflux.

作者信息

Akyol Onder Esra Nagehan, Ensari Esra, Ertan Pelin

机构信息

Aksaray University Training and Research Hospital, Department of Paediatric Nephrology, Aksaray, TR-68200, Turkey.

Antalya City Hospital, Department of Paediatric Nephrology, Antalya, TR-07080, Turkey.

出版信息

J Pediatr Urol. 2025 Apr;21(2):504-509. doi: 10.1016/j.jpurol.2024.12.002. Epub 2024 Dec 7.

DOI:10.1016/j.jpurol.2024.12.002

PMID:39694777

Abstract

INTRODUCTION

Vesicoureteral reflux (VUR) is a common congenital or acquired urinary disorder in children. Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-driven platform offering medical information. This research aims to assess the reliability and readability of ChatGPT-4o's answers regarding pediatric VUR for general, non-medical audience.

MATERIALS AND METHODS

Twenty of the most frequently asked English-language questions about VUR in children were used to evaluate ChatGPT-4o's responses. Two independent reviewers rated the reliability and quality using the Global Quality Scale (GQS) and a modified version of the DISCERN tool. The readability of ChatGPT responses was assessed through the Flesch Reading Ease (FRE) Score, Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and Simple Measure of Gobbledygook (SMOG).

RESULTS

Median mDISCERN and GQS scores were 4 (4-5) and 5 (3-5), respectively. Most of the responses of ChatGPT have moderate (55 %) and good (45 %) reliability according to the mDISCERN score and high quality (95 %) according to GQS. The mean ± standard deviation scores for FRE, FKGL, SMOG, GFI, and CLI of the text were 26 ± 12, 15 ± 2.5, 16.3 ± 2, 18.8 ± 2.9, and 15.3 ± 2.2, respectively, indicating a high level of reading difficulty.

DISCUSSION

While ChatGPT-4o offers accurate and high-quality information about pediatric VUR, its readability poses challenges, as the content is difficult to understand for a general audience.

CONCLUSION

ChatGPT provides high-quality, accessible information about VUR. However, improving readability should be a priority to make this information more user-friendly for a broader audience.

摘要

引言

膀胱输尿管反流（VUR）是儿童常见的先天性或后天性泌尿系统疾病。Chat生成式预训练变换器（ChatGPT）是一个提供医学信息的人工智能驱动平台。本研究旨在评估ChatGPT-4o针对普通非医学受众提供的有关小儿VUR问题答案的可靠性和可读性。

材料与方法

使用20个关于儿童VUR最常见的英文问题来评估ChatGPT-4o的回答。两名独立评审员使用全球质量量表（GQS）和DISCERN工具的修改版对可靠性和质量进行评分。通过弗莱什易读性得分（FRE）、弗莱什-金凯德年级水平（FKGL）、冈宁雾度指数（GFI）、科尔曼-廖指数（CLI）和晦涩难懂简易度量表（SMOG）来评估ChatGPT回答的可读性。

结果

mDISCERN和GQS评分的中位数分别为4（4-5）和5（3-5）。根据mDISCERN评分，ChatGPT的大多数回答具有中等（55%）和良好（45%）的可靠性，根据GQS则具有高质量（95%）。文本的FRE、FKGL、SMOG、GFI和CLI的平均±标准差得分分别为26±12、15±2.5、16.3±2、18.8±2.9和15.3±2.2，表明阅读难度较高。