Sarikaya Mehmet, Ozcan Siki Fatma, Ciftci Ilhan
Department of Pediatric Surgery, Faculty of Medicine, Selcuk University, Konya 42100, Turkey.
J Clin Med. 2025 Mar 30;14(7):2378. doi: 10.3390/jcm14072378.
This study aimed to evaluate the compliance of four different artificial intelligence applications (ChatGPT-4.0, Bing AI, Google Bard, and Perplexity) with the American Urological Association (AUA) vesicoureteral reflux (VUR) management guidelines. Fifty-one questions derived from the AUA guidelines were asked of each AI application. Two experienced paediatric surgeons independently scored the responses using a five-point Likert scale. Inter-rater agreement was analysed using the intraclass correlation coefficient (ICC). ChatGPT-4.0, Bing AI, Google Bard, and Perplexity received mean scores of 4.91, 4.85, 4.75 and 4.70 respectively. There was no statistically significant difference between the accuracy of the AI applications ( = 0.223). The inter-rater ICC values were above 0.9 for all platforms, indicating a high level of consistency in scoring. The evaluated AI applications agreed highly with the AUA VUR management guidelines. These results suggest that AI applications may be a potential tool for providing guideline-based recommendations in paediatric urology.
本研究旨在评估四种不同的人工智能应用程序(ChatGPT-4.0、必应人工智能、谷歌巴德和Perplexity)对美国泌尿外科学会(AUA)膀胱输尿管反流(VUR)管理指南的遵循情况。向每个人工智能应用程序提出了51个源自AUA指南的问题。两名经验丰富的儿科外科医生使用五点李克特量表对回答进行独立评分。使用组内相关系数(ICC)分析评分者间的一致性。ChatGPT-4.0、必应人工智能、谷歌巴德和Perplexity的平均得分分别为4.91、4.85、4.75和4.70。人工智能应用程序的准确性之间没有统计学上的显著差异(=0.223)。所有平台的评分者间ICC值均高于0.9,表明评分具有高度一致性。评估的人工智能应用程序与AUA VUR管理指南高度一致。这些结果表明,人工智能应用程序可能是在小儿泌尿外科提供基于指南的建议的潜在工具。
BMC Oral Health. 2025-8-23
Clin Med Insights Oncol. 2025-1-6
J Pediatr Urol. 2025-4
World J Urol. 2024-10-17