Suppr超能文献

一项由生成式人工智能驱动的交互式听力评估任务。

A generative AI-driven interactive listening assessment task.

作者信息

Runge Andrew, Attali Yigal, LaFlair Geoffrey T, Park Yena, Church Jacqueline

机构信息

Duolingo, Pittsburgh, PA, United States.

出版信息

Front Artif Intell. 2024 Nov 4;7:1474019. doi: 10.3389/frai.2024.1474019. eCollection 2024.

Abstract

INTRODUCTION

Assessments of interactional competence have traditionally been limited in large-scale language assessments. The listening portion suffers from construct underrepresentation, whereas the speaking portion suffers from limited task formats such as in-person interviews or role plays. Human-delivered tasks are challenging to administer at large scales, while automated assessments are typically very narrow in their assessment of the construct because they have carried over the limitations of traditional paper-based tasks to digital formats. However, computer-based assessments do allow for more interactive, automatically administered tasks, but come with increased complexity in task creation. Large language models present new opportunities for enhanced automated item generation (AIG) processes that can create complex content types and tasks at scale that support richer assessments.

METHODS

This paper describes the use of such methods to generate content at scale for an interactive listening measure of interactional competence for the Duolingo English Test (DET), a large-scale, high-stakes test of English proficiency. The Interactive Listening task assesses test takers' ability to participate in a full conversation, resulting in a more authentic assessment of interactive listening ability than prior automated assessments by positing comprehension and interaction as purposes of listening.

RESULTS AND DISCUSSION

The results of a pilot of 713 tasks with hundreds of responses per task, along with the results of human review, demonstrate the feasibility of a human-in-the-loop, generative AI-driven approach for automatic creation of complex educational assessments at scale.

摘要

引言

在大规模语言评估中,互动能力评估传统上受到限制。听力部分存在结构代表性不足的问题,而口语部分则存在任务形式有限的问题,如面对面访谈或角色扮演。人工交付的任务在大规模管理方面具有挑战性,而自动化评估在构建评估方面通常非常狭窄,因为它们将传统纸质任务的局限性延续到了数字格式中。然而,基于计算机的评估确实允许进行更多交互式、自动管理的任务,但任务创建的复杂性增加。大型语言模型为增强自动化项目生成(AIG)过程带来了新机会,该过程可以大规模创建复杂的内容类型和任务,以支持更丰富的评估。

方法

本文描述了使用此类方法为多邻国英语测试(DET)的互动能力交互式听力测量大规模生成内容的情况,DET是一项大规模、高风险的英语水平测试。交互式听力任务评估考生参与完整对话的能力,通过将理解和互动作为听力目的,比以前的自动化评估更真实地评估交互式听力能力。

结果与讨论

对713个任务进行试点的结果,每个任务有数百个回答,以及人工审核的结果,证明了一种人在回路中、由生成式人工智能驱动的方法在大规模自动创建复杂教育评估方面的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4699/11571064/a89720df4f95/frai-07-1474019-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验