Cilli Mesut, Ulutas Kemal Turker
Department of Urology, Hatay Reyhanlı State Hospital, Ministry of Health, Hatay, Turkey.
Department of Biochemistry, Hatay Reyhanlı State Hospital, Ministry of Health, Hatay, Turkey.
Indian J Urol. 2025 Apr-Jun;41(2):117-123. doi: 10.4103/iju.iju_409_24. Epub 2025 Apr 1.
This study delved into the responses generated by ChatGPT-4 (artificial intelligence-language model) regarding queries on sexually transmitted urethritis in men and investigated the impact of "knowledge of conversing with a urologist" on the accuracy of its responses.
A total of 272 questions from the "sexually transmitted infections treatment guidelines" (US Centers for Disease Control and Prevention) were prepared by a urology specialist and arranged to cover various levels of difficulty. The questions were presented in the formats of multiple-choice and true/false. Two groups were created: In Group 1, ChatGPT-4 was only provided with the questions, whereas in Group 2, it was explicitly stated that ChatGPT-4 was engaging in a conversation with a urology specialist. The accuracy of ChatGPT-4's responses was evaluated.
In Group 1, the accuracy rate was 81% (94/116), whereas in Group 2, it was 77.5% (90/116). Subgroup A, which consisted of multiple-choice questions, had accuracy rates of 77.5% (45/58) for Group 1 and 74.1% (43/58) for Group 2. Subgroup B, which included true/false questions, had accuracy rates of 84.4% (49/58) for Group 1 and 81% (47/58) for Group 2. The mean accuracy score was higher in Group 1, whereas the mean completeness score was higher in Group 2.
Providing ChatGPT-4 with the information that it was conversing with a urologist did not enhance the accuracy of its responses regarding sexually transmitted urethritis in men. The consistently high accuracy observed in ChatGPT-4's responses demonstrates that this system can be reliably used as a question-and-answer tool.
本研究深入探讨了ChatGPT-4(人工智能语言模型)对男性性传播尿道炎相关问题的回答,并研究了“与泌尿科医生交谈的知识”对其回答准确性的影响。
由一名泌尿科专家准备了总共272个来自“性传播感染治疗指南”(美国疾病控制与预防中心)的问题,并安排涵盖不同难度级别。问题以多项选择题和是非题的形式呈现。创建了两组:在第1组中,仅向ChatGPT-4提供问题,而在第2组中,明确说明ChatGPT-4正在与一名泌尿科医生进行对话。评估了ChatGPT-4回答的准确性。
在第1组中,准确率为81%(94/116),而在第2组中为77.5%(90/116)。由多项选择题组成的A亚组,第1组的准确率为77.5%(45/58),第2组为74.1%(43/58)。包括是非题的B亚组,第1组的准确率为84.4%(49/58),第2组为81%(47/58)。第1组的平均准确性得分较高,而第2组的平均完整性得分较高。
向ChatGPT-4提供其正在与泌尿科医生交谈的信息,并未提高其对男性性传播尿道炎回答的准确性。ChatGPT-4回答中始终保持的高准确率表明,该系统可作为可靠的问答工具使用。