Lee Kong Chian School of Medicine, Nanyang Technological University Singapore, Singapore, Singapore.
Singapore Polytechnic, Singapore, Singapore.
Anat Sci Educ. 2024 Oct;17(7):1396-1405. doi: 10.1002/ase.2502. Epub 2024 Aug 21.
Large Language Models (LLMs) have the potential to improve education by personalizing learning. However, ChatGPT-generated content has been criticized for sometimes producing false, biased, and/or hallucinatory information. To evaluate AI's ability to return clear and accurate anatomy information, this study generated a custom interactive and intelligent chatbot (Anatbuddy) through an Open AI Application Programming Interface (API) that enables seamless AI-driven interactions within a secured cloud infrastructure. Anatbuddy was programmed through a Retrieval Augmented Generation (RAG) method to provide context-aware responses to user queries based on a predetermined knowledge base. To compare their outputs, various queries (i.e., prompts) on thoracic anatomy (n = 18) were fed into Anatbuddy and ChatGPT 3.5. A panel comprising three experienced anatomists evaluated both tools' responses for factual accuracy, relevance, completeness, coherence, and fluency on a 5-point Likert scale. These ratings were reviewed by a third party blinded to the study, who revised and finalized scores as needed. Anatbuddy's factual accuracy (mean ± SD = 4.78/5.00 ± 0.43; median = 5.00) was rated significantly higher (U = 84, p = 0.01) than ChatGPT's accuracy (4.11 ± 0.83; median = 4.00). No statistically significant differences were detected between the chatbots for the other variables. Given ChatGPT's current content knowledge limitations, we strongly recommend the anatomy profession develop a custom AI chatbot for anatomy education utilizing a carefully curated knowledge base to ensure accuracy. Further research is needed to determine students' acceptance of custom chatbots for anatomy education and their influence on learning experiences and outcomes.
大型语言模型 (LLMs) 具有通过个性化学习来改善教育的潜力。然而,ChatGPT 生成的内容因有时会产生虚假、有偏见和/或幻觉的信息而受到批评。为了评估人工智能返回清晰准确解剖学信息的能力,本研究通过 Open AI 应用程序编程接口 (API) 生成了一个定制的交互式智能聊天机器人 (Anatbuddy),该接口允许在安全的云基础设施内进行无缝的人工智能驱动交互。Anatbuddy 通过检索增强生成 (RAG) 方法进行编程,根据预定的知识库为用户查询提供上下文感知的响应。为了比较它们的输出,将各种查询(即提示)输入到胸解剖学中(n = 18),并将其输入到 Anatbuddy 和 ChatGPT 3.5 中。一个由三位经验丰富的解剖学家组成的小组对这两个工具的回答进行了评估,评估内容包括事实准确性、相关性、完整性、连贯性和流畅性,使用 5 点李克特量表进行评分。这些评分由一位对研究不知情的第三方进行审查,并根据需要进行修订和最终评分。Anatbuddy 的事实准确性(平均值 ± 标准差 = 4.78/5.00 ± 0.43;中位数 = 5.00)评分显著高于 ChatGPT 的准确性(4.11 ± 0.83;中位数 = 4.00)(U = 84,p = 0.01)。在其他变量方面,两个聊天机器人之间没有检测到统计学上的显著差异。鉴于 ChatGPT 当前的内容知识局限性,我们强烈建议解剖学界开发一个用于解剖学教育的自定义 AI 聊天机器人,利用精心策划的知识库来确保准确性。需要进一步研究以确定学生对解剖学教育自定义聊天机器人的接受程度以及它们对学习体验和结果的影响。