Attali Yigal, Runge Andrew, LaFlair Geoffrey T, Yancey Kevin, Goodwin Sarah, Park Yena, von Davier Alina A
Duolingo, Pittsburgh, PA, United States.
Front Artif Intell. 2022 Jul 22;5:903077. doi: 10.3389/frai.2022.903077. eCollection 2022.
Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a transformer-based deep language modeling approach for creating reading comprehension assessments. This approach allows a fully automated process for the creation of source passages together with a wide range of comprehension questions about the passages. The format of the questions allows automatic scoring of responses with high fidelity (e.g., selected response questions). We present the results of a large-scale pilot of the interactive reading task, with hundreds of passages and thousands of questions. These passages were administered as part of the practice test of the Duolingo English Test. Human review of the materials and psychometric analyses of test taker results demonstrate the feasibility of this approach for automatic creation of complex educational assessments.
自动试题生成(AIG)有潜力极大地扩充用于教育评估的试题数量,同时允许采用一种更受结构驱动的试题开发方法。然而,AIG中的传统试题建模方法在范围上局限于相对容易建模的内容领域(如数学问题),并且依赖高技能的内容专家来创建每个模型。在本文中,我们描述了交互式阅读任务,这是一种基于Transformer的深度语言建模方法,用于创建阅读理解评估。这种方法允许通过一个完全自动化的过程来创建源文章以及关于这些文章的各种理解问题。问题的格式允许对回答进行高保真的自动评分(例如,选择题)。我们展示了交互式阅读任务大规模试点的结果,包括数百篇文章和数千个问题。这些文章作为多邻国英语测试练习测试的一部分进行管理。对材料的人工审核和考生结果的心理测量分析证明了这种方法用于自动创建复杂教育评估的可行性。