ChatGPT能获得工程学位吗？评估高等教育对人工智能助手的脆弱性。

Could ChatGPT get an engineering degree? Evaluating higher education vulnerability to AI assistants.

作者信息

Borges Beatriz, Foroutan Negar, Bayazit Deniz, Sotnikova Anna, Montariol Syrielle, Nazaretsky Tanya, Banaei Mohammadreza, Sakhaeirad Alireza, Servant Philippe, Neshaei Seyed Parsa, Frej Jibril, Romanou Angelika, Weiss Gail, Mamooler Sepideh, Chen Zeming, Fan Simin, Gao Silin, Ismayilzada Mete, Paul Debjit, Schwaller Philippe, Friedli Sacha, Jermann Patrick, Käser Tanja, Bosselut Antoine

机构信息

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland.

出版信息

Proc Natl Acad Sci U S A. 2024 Dec 3;121(49):e2414955121. doi: 10.1073/pnas.2414955121. Epub 2024 Nov 26.

DOI:10.1073/pnas.2414955121

PMID:39589890

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11626143/

Abstract

AI assistants, such as ChatGPT, are being increasingly used by students in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes. We conceptualize these challenges through the lens of vulnerability, the potential for university assessments and learning outcomes to be impacted by student use of generative AI. We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level Science, Technology, Engineering, and Mathematics (STEM) courses. Specifically, we compile a dataset of textual assessment questions from 50 courses at the École polytechnique fédérale de Lausanne (EPFL) and evaluate whether two AI assistants, GPT-3.5 and GPT-4 can adequately answer these questions. We use eight prompting strategies to produce responses and find that GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. When grouping courses in our dataset by degree program, these systems already pass the nonproject assessments of large numbers of core courses in various degree programs, posing risks to higher education accreditation that will be amplified as these models improve. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.

摘要

诸如ChatGPT这样的人工智能助手正越来越多地被高等教育机构的学生使用。虽然这些工具为改进教学和教育提供了机会，但它们也给评估和学习成果带来了重大挑战。我们从脆弱性的角度来概念化这些挑战，即大学评估和学习成果有可能受到学生使用生成式人工智能的影响。我们通过衡量人工智能助手在标准大学水平的科学、技术、工程和数学（STEM）课程中完成评估问题的程度，来调查这种脆弱性的潜在规模。具体而言，我们编制了洛桑联邦理工学院（EPFL）50门课程的文本评估问题数据集，并评估两个人工智能助手GPT-3.5和GPT-4能否充分回答这些问题。我们使用八种提示策略来生成回答，发现GPT-4平均能正确回答65.8%的问题，甚至在至少一种提示策略下，能对85.1%的问题给出正确答案。当我们数据集中的课程按学位项目分组时，这些系统已经通过了各个学位项目中大量核心课程的非项目评估，随着这些模型的改进，这对高等教育认证构成的风险将被放大。我们的研究结果呼吁根据生成式人工智能的进展，对高等教育中的项目级评估设计进行修订。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9df/11626143/cafe2d21d1b1/pnas.2414955121fig01.jpg

相似文献

Could ChatGPT get an engineering degree? Evaluating higher education vulnerability to AI assistants.ChatGPT能获得工程学位吗？评估高等教育对人工智能助手的脆弱性。

Proc Natl Acad Sci U S A. 2024 Dec 3;121(49):e2414955121. doi: 10.1073/pnas.2414955121. Epub 2024 Nov 26.

Can Generative AI and ChatGPT Break Human Supremacy in Mathematics and Reshape Competence in Cognitive-Demanding Problem-Solving Tasks?生成式人工智能和ChatGPT能否打破人类在数学领域的主导地位，并重塑在需要认知能力的问题解决任务中的能力？

J Intell. 2025 Apr 2;13(4):43. doi: 10.3390/jintelligence13040043.

Is ChatGPT 'ready' to be a learning tool for medical undergraduates and will it perform equally in different subjects? Comparative study of ChatGPT performance in tutorial and case-based learning questions in physiology and biochemistry.ChatGPT是否“准备好”成为医学本科生的学习工具，它在不同学科中的表现是否相同？ChatGPT在生理学和生物化学的辅导及基于案例的学习问题中的表现比较研究。

Med Teach. 2024 Nov;46(11):1441-1447. doi: 10.1080/0142159X.2024.2308779. Epub 2024 Jan 31.

Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量：评估研究

ArXiv. 2024 Jan 23:arXiv:2402.01693v1.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students.ChatGPT——医学教育的双刃剑？对牙科学生评估的影响。

Eur J Dent Educ. 2024 Feb;28(1):206-211. doi: 10.1111/eje.12937. Epub 2023 Aug 7.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗？

Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study.多伦多大学家庭医学住院医师进展测试中住院医师与人工智能聊天机器人表现的评估：比较研究

JMIR Med Educ. 2023 Sep 19;9:e50514. doi: 10.2196/50514.

Quality assurance and validity of AI-generated single best answer questions.人工智能生成的最佳单一答案问题的质量保证与有效性

BMC Med Educ. 2025 Feb 25;25(1):300. doi: 10.1186/s12909-025-06881-w.

本文引用的文献

A study of the impact of project-based learning on student learning effects: a meta-analysis study.基于项目的学习对学生学习效果影响的研究：一项元分析研究。

Front Psychol. 2023 Jul 17;14:1202728. doi: 10.3389/fpsyg.2023.1202728. eCollection 2023.

The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research.人工智能、自然学习处理和大型语言模型在高等教育和研究中的新兴作用。

Res Social Adm Pharm. 2023 Aug;19(8):1236-1242. doi: 10.1016/j.sapharm.2023.05.016. Epub 2023 Jun 4.

Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy?学术诚信与人工智能：ChatGPT 是炒作、英雄还是异端？

Semin Nucl Med. 2023 Sep;53(5):719-730. doi: 10.1053/j.semnuclmed.2023.04.008. Epub 2023 May 22.

Competition-level code generation with AlphaCode.使用 AlphaCode 进行竞赛级别的代码生成。

Science. 2022 Dec 9;378(6624):1092-1097. doi: 10.1126/science.abq1158. Epub 2022 Dec 8.

An automated essay scoring systems: a systematic literature review.一种自动作文评分系统：系统文献综述。

Artif Intell Rev. 2022;55(3):2495-2527. doi: 10.1007/s10462-021-10068-2. Epub 2021 Sep 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT能获得工程学位吗？评估高等教育对人工智能助手的脆弱性。

Could ChatGPT get an engineering degree? Evaluating higher education vulnerability to AI assistants.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献