Boztas Asya Eylem, Ensari Esra
Health and Science University Dr. Behcet Uz Pediatric Diseases and Surgery Training and Research Hospital, Department of Pediatric Surgery, Kultur mh. Dr.Mustafa Enver Bey cd. No:32 D:10 Konak, Izmir, Turkey.
Antalya City Hospital, Department of Paediatric Nephrology, 07080, Antalya, Turkey.
J Pediatr Urol. 2025 Apr 30. doi: 10.1016/j.jpurol.2025.04.031.
The purpose of the study was to evaluate both the accuracy and reproducibility of the answers given by ChatGPT-4o®, Gemini® and Copilot® to frequently asked questions about pediatric primary enuresis nocturna.
Forty frequently asked questions about primary nocturnal enuresis were asked 2 times, one week apart, on ChatGPT-4o, Gemini and Copilot. One of each pediatric surgeon and nephrologist independently scored the answers into 4 groups: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each chatbots answers were evaluated.
In comparison of these most common used chatbots, the order of completely correct response rates from highest to lowest was Chat GPT-4o and followed by Copilot and Gemini. With an accuracy percentage of 92.5 %, ChatGPT-4o gave the most accurate responses of any AI chatbot. Gemini answered 50 % of questions correctly. Copilot was the weakest successful chatbot in answering questions about enuresis nocturna with 45 % of completely accurate answer ratio. Besides Copilot has a ratio of 2.5 % for completely inaccurate/irrelevant response. Reproducibility of ChatGPT-4o, Gemini and Copilots were 85 %, 77.5 %, 70 % respectively.
ChatGPT-4o is more successful in providing a high percentage of accurate responses regarding nocturnal enuresis. Both patients and their parents can use it, especially for simple, low-complexity medical questions. However, it should be used alongside expert healthcare proffesional.
本研究的目的是评估ChatGPT-4o®、Gemini®和Copilot®对小儿原发性夜间遗尿症常见问题给出答案的准确性和可重复性。
在ChatGPT-4o、Gemini和Copilot上,相隔一周两次询问40个关于原发性夜间遗尿症的常见问题。每位小儿外科医生和肾脏病学家分别将答案分为4组:全面/正确(1)、不完整/部分正确(2)、准确与不准确/误导性混合(3)以及完全不准确/不相关(4)。评估每个聊天机器人答案的准确性和可重复性。
在这些最常用的聊天机器人的比较中,完全正确回答率从高到低的顺序是Chat GPT-4o,其次是Copilot和Gemini。ChatGPT-4o的准确率为92.5%,是所有人工智能聊天机器人中给出最准确回答的。Gemini正确回答了50%的问题。Copilot是回答夜间遗尿症问题最弱的成功聊天机器人,完全准确答案的比例为45%。此外,Copilot完全不准确/不相关回答的比例为2.5%。ChatGPT-4o、Gemini和Copilot的可重复性分别为85%、77.5%、70%。
ChatGPT-4o在提供关于夜间遗尿症的高比例准确回答方面更成功。患者及其父母都可以使用它,特别是对于简单、低复杂度的医学问题。然而,它应该与专业医疗保健人员一起使用。