Meo Sultan Ayoub, Al-Khlaiwi Thamir, AbuKhalaf Abdulelah Adnan, Meo Anusha Sultan, Klonoff David C
Department of Physiology, College of Medicine, King Saud University, Riyadh, Saudi Arabia.
College of Medicine, King Saud University, Riyadh, Saudi Arabia.
J Diabetes Sci Technol. 2025 May;19(3):705-710. doi: 10.1177/19322968231203987. Epub 2023 Oct 5.
The present study aimed to investigate the knowledge level of Bard and ChatGPT in the areas of endocrinology, diabetes, and diabetes technology through a multiple-choice question (MCQ) examination format.
Initially, a 100-MCQ bank was established based on MCQs in endocrinology, diabetes, and diabetes technology. The MCQs were created from physiology, medical textbooks, and academic examination pools in the areas of endocrinology, diabetes, and diabetes technology and academic examination pools. The study team members analyzed the MCQ contents to ensure that they were related to the endocrinology, diabetes, and diabetes technology. The number of MCQs from endocrinology was 50, and that from diabetes and science technology was also 50. The knowledge level of Google's Bard and ChatGPT was assessed with an MCQ-based examination.
In the endocrinology examination section, ChatGPT obtained 29 marks (correct responses) of 50 (58%), and Bard obtained a similar score of 29 of 50 (58%). However, in the diabetes technology examination section, ChatGPT obtained 23 marks of 50 (46%), and Bard obtained 20 marks of 50 (40%). Overall, in the entire three-part examination, ChatGPT obtained 52 marks of 100 (52%), and Bard obtained 49 marks of 100 (49%). ChatGPT obtained slightly more marks than Bard. However, both ChatGPT and Bard did not achieve satisfactory scores in endocrinology or diabetes/technology of at least 60%.
The overall MCQ-based performance of ChatGPT was slightly better than that of Google's Bard. However, both ChatGPT and Bard did not achieve appropriate scores in endocrinology and diabetes/diabetes technology. The study indicates that Bard and ChatGPT have the potential to facilitate medical students and faculty in academic medical education settings, but both artificial intelligence tools need more updated information in the fields of endocrinology, diabetes, and diabetes technology.
本研究旨在通过多项选择题(MCQ)考试形式,调查Bard和ChatGPT在内分泌学、糖尿病及糖尿病技术领域的知识水平。
最初,基于内分泌学、糖尿病及糖尿病技术方面的多项选择题建立了一个包含100道题的题库。这些多项选择题取自内分泌学、糖尿病及糖尿病技术领域的生理学、医学教科书和学术考试题库。研究团队成员分析了多项选择题的内容,以确保它们与内分泌学、糖尿病及糖尿病技术相关。内分泌学方面的多项选择题有50道,糖尿病及科学技术方面的也有50道。通过基于多项选择题的考试评估了谷歌的Bard和ChatGPT的知识水平。
在内分泌学考试部分,ChatGPT在50道题中获得了29分(正确答案)(58%),Bard获得了类似的分数,50道题中答对29道(58%)。然而,在糖尿病技术考试部分,ChatGPT在50道题中获得了23分(46%),Bard在50道题中获得了20分(40%)。总体而言,在整个三部分考试中,ChatGPT在100道题中获得了52分(52%),Bard在100道题中获得了49分(49%)。ChatGPT获得的分数比Bard略多。然而,ChatGPT和Bard在内分泌学或糖尿病/技术方面都没有达到至少60%的满意分数。
基于多项选择题的ChatGPT总体表现略优于谷歌的Bard。然而,ChatGPT和Bard在内分泌学以及糖尿病/糖尿病技术方面都没有取得合适的分数。该研究表明,Bard和ChatGPT有潜力在学术医学教育环境中帮助医学生和教师,但这两个人工智能工具在内分泌学、糖尿病及糖尿病技术领域都需要更多更新的信息。