比较和评估四个 AI 聊天机器人在经济学方面的能力。

Comparing and assessing four AI chatbots' competence in economics.

机构信息

Department of Economics and Business, Kalamazoo College, Kalamazoo, Michigan, United States of America.

Department of Academic Development, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.

出版信息

PLoS One. 2024 May 8;19(5):e0297804. doi: 10.1371/journal.pone.0297804. eCollection 2024.

DOI:10.1371/journal.pone.0297804

PMID:38718042

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11078351/

Abstract

Artificial Intelligence (AI) chatbots have emerged as powerful tools in modern academic endeavors, presenting both opportunities and challenges in the learning landscape. They can provide content information and analysis across most academic disciplines, but significant differences exist in terms of response accuracy for conclusions and explanations, as well as word counts. This study explores four distinct AI chatbots, GPT-3.5, GPT-4, Bard, and LLaMA 2, for accuracy of conclusions and quality of explanations in the context of university-level economics. Leveraging Bloom's taxonomy of cognitive learning complexity as a guiding framework, the study confronts the four AI chatbots with a standard test for university-level understanding of economics, as well as more advanced economics problems. The null hypothesis that all AI chatbots perform equally well on prompts that explore understanding of economics is rejected. The results are that significant differences are observed across the four AI chatbots, and these differences are exacerbated as the complexity of the economics-related prompts increased. These findings are relevant to both students and educators; students can choose the most appropriate chatbots to better understand economics concepts and thought processes, while educators can design their instruction and assessment while recognizing the support and resources students have access to through AI chatbot platforms.

摘要

人工智能 (AI) 聊天机器人已成为现代学术研究中的强大工具，在学习环境中既带来了机遇，也带来了挑战。它们可以在大多数学术领域提供内容信息和分析，但在结论和解释的响应准确性以及字数方面存在显著差异。本研究探讨了 GPT-3.5、GPT-4、Bard 和 LLaMA 2 这四个不同的 AI 聊天机器人，以评估它们在大学经济学水平上的结论准确性和解释质量。本研究利用布鲁姆的认知学习复杂度分类法作为指导框架，让四个 AI 聊天机器人回答一个关于经济学大学水平理解的标准测试题，以及更高级的经济学问题。所有 AI 聊天机器人在探索经济学理解的提示上表现同样出色的零假设被拒绝。结果表明，四个 AI 聊天机器人之间存在显著差异，而且随着与经济学相关的提示的复杂性增加，这些差异也会加剧。这些发现与学生和教育者都相关；学生可以选择最适合的聊天机器人来更好地理解经济学概念和思维过程，而教育者在设计教学和评估时，可以认识到学生通过 AI 聊天机器人平台获得的支持和资源。