Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy.
Hepatogastroenterology Division, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Via Luigi de Crecchio, 80138, Naples, Italy.
Dig Liver Dis. 2024 Aug;56(8):1304-1311. doi: 10.1016/j.dld.2024.02.019. Epub 2024 Mar 19.
Conversational chatbots, fueled by large language models, spark debate over their potential in education and medical career exams. There is debate in the literature about the scientific integrity of the outputs produced by these chatbots.
This study evaluates ChatGPT 3.5 and Perplexity AI's cross-sectional performance in responding to questions from the 2023 Italian national residency admission exam (SSM23), comparing results and chatbots' concordance with previous years SSMs.
Gastroenterology-related SSM23 questions were input into ChatGPT 3.5 and Perplexity AI, evaluating their performance in correct responses and total scores. This process was repeated with questions from the three preceding years. Additionally, chatbot concordance was assessed using Cohen's method.
In SSM23, ChatGPT 3.5 outperforms Perplexity AI with 94.11% correct responses, demonstrating consistency across years. Concordance weakened in 2023 (κ=0.203, P = 0.148), but ChatGPT consistently maintains a high standard compared to Perplexity AI.
ChatGPT 3.5 and Perplexity AI exhibit promise in addressing gastroenterological queries, emphasizing potential educational roles. However, their variable performance mandates cautious use as supplementary tools alongside conventional study methods. Clear guidelines are crucial for educators to balance traditional approaches and innovative systems, enhancing educational standards.
基于大型语言模型的对话式聊天机器人在教育和医学职业考试中的应用引发了争议。关于这些聊天机器人生成的输出的科学完整性,文献中存在争议。
本研究评估了 ChatGPT 3.5 和 Perplexity AI 在回答 2023 年意大利国家住院医师入学考试(SSM23)相关问题时的横断面表现,比较了它们的结果以及与前几年 SSM 考试的一致性。
将 SSM23 中的胃肠病学相关问题输入到 ChatGPT 3.5 和 Perplexity AI 中,评估它们在正确回答和总分数方面的表现。对前三年的问题重复了这一过程。此外,还使用 Cohen 方法评估了聊天机器人的一致性。
在 SSM23 中,ChatGPT 3.5 的正确回答率为 94.11%,优于 Perplexity AI,且表现稳定。2023 年的一致性有所减弱(κ=0.203,P=0.148),但与 Perplexity AI 相比,ChatGPT 始终保持高标准。
ChatGPT 3.5 和 Perplexity AI 在解决胃肠病学问题方面表现出潜力,强调了其在教育中的潜在作用。然而,它们的性能变化要求谨慎使用,作为传统学习方法的补充工具。明确的指导方针对于教育工作者至关重要,有助于在平衡传统方法和创新系统的基础上提高教育标准。