一项由生成式人工智能驱动的交互式听力评估任务。

A generative AI-driven interactive listening assessment task.

作者信息

Runge Andrew, Attali Yigal, LaFlair Geoffrey T, Park Yena, Church Jacqueline

机构信息

Duolingo, Pittsburgh, PA, United States.

出版信息

Front Artif Intell. 2024 Nov 4;7:1474019. doi: 10.3389/frai.2024.1474019. eCollection 2024.

DOI:10.3389/frai.2024.1474019

PMID:39559344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11571064/

Abstract

INTRODUCTION

Assessments of interactional competence have traditionally been limited in large-scale language assessments. The listening portion suffers from construct underrepresentation, whereas the speaking portion suffers from limited task formats such as in-person interviews or role plays. Human-delivered tasks are challenging to administer at large scales, while automated assessments are typically very narrow in their assessment of the construct because they have carried over the limitations of traditional paper-based tasks to digital formats. However, computer-based assessments do allow for more interactive, automatically administered tasks, but come with increased complexity in task creation. Large language models present new opportunities for enhanced automated item generation (AIG) processes that can create complex content types and tasks at scale that support richer assessments.

METHODS

This paper describes the use of such methods to generate content at scale for an interactive listening measure of interactional competence for the Duolingo English Test (DET), a large-scale, high-stakes test of English proficiency. The Interactive Listening task assesses test takers' ability to participate in a full conversation, resulting in a more authentic assessment of interactive listening ability than prior automated assessments by positing comprehension and interaction as purposes of listening.

RESULTS AND DISCUSSION

The results of a pilot of 713 tasks with hundreds of responses per task, along with the results of human review, demonstrate the feasibility of a human-in-the-loop, generative AI-driven approach for automatic creation of complex educational assessments at scale.

摘要

引言

在大规模语言评估中，互动能力评估传统上受到限制。听力部分存在结构代表性不足的问题，而口语部分则存在任务形式有限的问题，如面对面访谈或角色扮演。人工交付的任务在大规模管理方面具有挑战性，而自动化评估在构建评估方面通常非常狭窄，因为它们将传统纸质任务的局限性延续到了数字格式中。然而，基于计算机的评估确实允许进行更多交互式、自动管理的任务，但任务创建的复杂性增加。大型语言模型为增强自动化项目生成（AIG）过程带来了新机会，该过程可以大规模创建复杂的内容类型和任务，以支持更丰富的评估。

方法

本文描述了使用此类方法为多邻国英语测试（DET）的互动能力交互式听力测量大规模生成内容的情况，DET是一项大规模、高风险的英语水平测试。交互式听力任务评估考生参与完整对话的能力，通过将理解和互动作为听力目的，比以前的自动化评估更真实地评估交互式听力能力。

结果与讨论

对713个任务进行试点的结果，每个任务有数百个回答，以及人工审核的结果，证明了一种人在回路中、由生成式人工智能驱动的方法在大规模自动创建复杂教育评估方面的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4699/11571064/a89720df4f95/frai-07-1474019-g001.jpg

相似文献

A generative AI-driven interactive listening assessment task.一项由生成式人工智能驱动的交互式听力评估任务。

Front Artif Intell. 2024 Nov 4;7:1474019. doi: 10.3389/frai.2024.1474019. eCollection 2024.

The interactive reading task: Transformer-based automatic item generation.交互式阅读任务：基于Transformer的自动试题生成。

Front Artif Intell. 2022 Jul 22;5:903077. doi: 10.3389/frai.2022.903077. eCollection 2022.

An Argument-Based Validation of an Asynchronous Written Interaction Task.

Front Psychol. 2022 Jun 22;13:889488. doi: 10.3389/fpsyg.2022.889488. eCollection 2022.

Examining the subjective fairness of at-home and online tests: Taking Duolingo English Test as an example.考察家庭考试和在线考试的主观公平性：以多邻国英语测试为例。

PLoS One. 2023 Sep 19;18(9):e0291629. doi: 10.1371/journal.pone.0291629. eCollection 2023.

Decomposing Variation in Vocabulary and Listening Comprehension Task Performance in Spanish and English Into Person, Ecological, and Assessment Differences for Spanish-English Bilingual Children in the United States.将西班牙语和英语词汇和听力理解任务表现的差异分解为个人、生态和评估差异，以了解美国的西班牙语-英语双语儿童。

J Speech Lang Hear Res. 2024 Oct 8;67(10):3733-3747. doi: 10.1044/2024_JSLHR-23-00702. Epub 2024 Sep 18.

Rasch techniques for detecting bias in performance assessments: an example comparing the performance of native and non-native speakers on a test of academic English.用于检测绩效评估中偏差的拉施克技术：以比较母语者和非母语者在学术英语测试中的表现为例

J Appl Meas. 2003;4(2):181-97.

A pilot study assessing listening comprehension and reading comprehension in children with down syndrome: Construct validity from a multi-method perspective.一项评估唐氏综合征患儿听力理解和阅读理解能力的试点研究：基于多方法视角的结构效度。

Front Psychol. 2022 Aug 12;13:905273. doi: 10.3389/fpsyg.2022.905273. eCollection 2022.

Listening Effort by Native and Nonnative Listeners Due to Noise, Reverberation, and Talker Foreign Accent During English Speech Perception.母语和非母语听者在英语语音感知中因噪声、混响和说话者外国口音而产生的听力努力。

J Speech Lang Hear Res. 2019 Apr 15;62(4):1068-1081. doi: 10.1044/2018_JSLHR-H-17-0423.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Second and foreign language listening: unraveling the construct.第二语言和外语听力：剖析这一概念。

Percept Mot Skills. 2008 Aug;107(1):99-113. doi: 10.2466/pms.107.1.99-113.

引用本文的文献

Automatic- and Transformer-Based Automatic Item Generation: A Critical Review.基于自动和Transformer的自动试题生成：批判性综述

J Intell. 2025 Aug 12;13(8):102. doi: 10.3390/jintelligence13080102.

本文引用的文献

The interactive reading task: Transformer-based automatic item generation.交互式阅读任务：基于Transformer的自动试题生成。

Front Artif Intell. 2022 Jul 22;5:903077. doi: 10.3389/frai.2022.903077. eCollection 2022.

Critical Values for Yen's : Identification of Local Dependence in the Rasch Model Using Residual Correlations.严氏临界值：使用残差相关性识别拉施模型中的局部依赖性

Appl Psychol Meas. 2017 May;41(3):178-194. doi: 10.1177/0146621616677520. Epub 2016 Nov 16.

Differences in Reaction to Immediate Feedback and Opportunity to Revise Answers for Multiple-Choice and Open-Ended Questions.对选择题和开放式问题的即时反馈以及修改答案机会的反应差异。

Educ Psychol Meas. 2016 Oct;76(5):787-802. doi: 10.1177/0013164415612548. Epub 2015 Oct 26.

Automated Item Generation with Recurrent Neural Networks.基于循环神经网络的自动项目生成。

Psychometrika. 2018 Dec;83(4):847-857. doi: 10.1007/s11336-018-9608-y. Epub 2018 Mar 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一项由生成式人工智能驱动的交互式听力评估任务。

A generative AI-driven interactive listening assessment task.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS AND DISCUSSION

引言

方法

结果与讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献