没有护栏的生成式人工智能会损害学习：来自高中数学的证据。

Generative AI without guardrails can harm learning: Evidence from high school mathematics.

作者信息

Bastani Hamsa, Bastani Osbert, Sungu Alp, Ge Haosen, Kabakcı Özge, Mariman Rei

机构信息

Department of Operations, Information, and Decisions, Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

Wharton AI & Analytics, Philadelphia, PA 19104.

出版信息

Proc Natl Acad Sci U S A. 2025 Jul;122(26):e2422633122. doi: 10.1073/pnas.2422633122. Epub 2025 Jun 25.

DOI:10.1073/pnas.2422633122

PMID:40560616

Abstract

Generative AI is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving human productivity. A key question is how generative AI affects learning-namely, how humans acquire new skills as they perform tasks. Learning is critical to long-term productivity, especially since generative AI is fallible and users must check its outputs. We study this question via a field experiment where we provide nearly a thousand high school math students with access to generative AI tutors. To understand the differential impact of tool design on learning, we deploy two generative AI tutors: one that mimics a standard ChatGPT interface ("GPT Base") and one with prompts designed to safeguard learning ("GPT Tutor"). Consistent with prior work, our results show that having GPT-4 access while solving problems significantly improves performance (48% improvement in grades for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction in grades for GPT Base)-i.e., unfettered access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards in GPT Tutor. Without guardrails, students attempt to use GPT-4 as a "crutch" during practice problem sessions, and subsequently perform worse on their own. Thus, decision-makers must be cautious about design choices underlying generative AI deployments to preserve skill learning and long-term productivity.

摘要

生成式人工智能有望彻底改变人类的工作方式，并且已经在显著提高人类生产力方面展现出了潜力。一个关键问题是生成式人工智能如何影响学习，也就是说，人类在执行任务时如何获得新技能。学习对于长期生产力至关重要，尤其是因为生成式人工智能并非完美无缺，用户必须检查其输出结果。我们通过一项实地实验来研究这个问题，在实验中我们为近千名高中数学学生提供了使用生成式人工智能辅导工具的机会。为了了解工具设计对学习的不同影响，我们部署了两种生成式人工智能辅导工具：一种模仿标准的ChatGPT界面（“GPT基础版”），另一种带有旨在保障学习的提示（“GPT辅导版”）。与先前的研究结果一致，我们的研究结果表明，在解决问题时能够使用GPT-4可显著提高成绩（使用GPT基础版的学生成绩提高了48%，使用GPT辅导版的学生成绩提高了127%）。然而，我们还发现，当随后不再提供使用机会时，学生的实际表现比那些从未有过使用机会的学生更差（使用GPT基础版的学生成绩下降了17%），也就是说，不受限制地使用GPT-4可能会损害教育成果。GPT辅导版中的保障措施在很大程度上减轻了这些负面的学习影响。如果没有这些限制措施，学生在练习问题环节会试图将GPT-4当作“拐杖”，随后自己的表现会更差。因此，决策者在进行生成式人工智能部署时，必须谨慎考虑其背后的设计选择，以保障技能学习和长期生产力。