Mesnard Benoît, Schirmann Aurélie, Branchereau Julien, Perrot Ophélie, Bogaert Guy, Neuzillet Yann, Lebret Thierry, Madec François-Xavier
Urology Department, Hôpital Foch, Suresnes, France.
Urology Department, Nantes University Hospital, Nantes, France.
Eur Urol Open Sci. 2024 Jan 30;60:44-46. doi: 10.1016/j.euros.2024.01.002. eCollection 2024 Feb.
The role of artificial intelligence (AI) in the medical domain is increasing on an annual basis. AI allows instant access to the latest scientific data in urological surgery, facilitating a level of theoretical knowledge that previously required several years of practice and training. To evaluate the capability of AI to provide robust data in a specialized domain, we submitted the in-service assessment of the European Board of Urology to three different AI tools: ChatGPT 3.5, ChatGPT 4.0, and Bard. The assessment consists of 100 single-answer questions with four multiple-choice options. We compared the responses of 736 participants to the AI responses. The average score for the 736 participants was 67.20. ChatGPT 3.5 scored 59 points, ranking in 570th place. ChatGPT 4.0 scored 80 points, ranking 80th, just on the border of the top 10%. Google Bard scored 68 points, ranking 340th. Our study demonstrates that AI systems have the capability to participate in a urological examination and achieve satisfactory results. However, a critical perspective must be maintained, as current AI systems are not infallible. Finally, the role of AI in the acquisition of knowledge and the dissemination of information remains to be delineated.
We submitted questions from the European Diploma in Urological Surgery to three artificial intelligence (AI) systems. Our findings reveal that AI tools show remarkable performance in assessments of urological surgical knowledge. However, certain limitations were also observed.
人工智能(AI)在医学领域的作用正逐年增加。人工智能使泌尿外科手术中能够即时获取最新科学数据,促进了理论知识水平的提升,而这种知识水平以前需要数年的实践和培训才能获得。为了评估人工智能在特定领域提供可靠数据的能力,我们将欧洲泌尿外科委员会的在职评估提交给了三种不同的人工智能工具:ChatGPT 3.5、ChatGPT 4.0和Bard。该评估由100道单项选择题组成,每题有四个选项。我们将736名参与者的答案与人工智能的回答进行了比较。736名参与者的平均分数为67.20分。ChatGPT 3.5得59分,排名第570位。ChatGPT 4.0得80分,排名第80位,刚好处于前10%的边缘。谷歌Bard得68分,排名第340位。我们的研究表明,人工智能系统有能力参与泌尿外科考试并取得令人满意的结果。然而,必须保持批判性的观点,因为当前的人工智能系统并非完美无缺。最后,人工智能在知识获取和信息传播中的作用仍有待明确。
我们将欧洲泌尿外科手术文凭考试的问题提交给了三个人工智能(AI)系统。我们的研究结果表明,人工智能工具在泌尿外科手术知识评估中表现出色。然而,也观察到了某些局限性。