教学目标一致可创建可靠的自动完形测试。

Pedagogically Aligned Objectives Create Reliable Automatic Cloze Tests.

作者信息

Ondov Brian, Demner-Fushman Dina, Attal Kush

机构信息

National Library of Medicine, Bethesda, MD, USA.

NYU Grossman School of Medicine, New York, NY, USA.

出版信息

Proc Conf. 2024 Jun;2024:3961-3972. doi: 10.18653/v1/2024.naacl-long.220.

DOI:10.18653/v1/2024.naacl-long.220

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12415984/

Abstract

The cloze training objective of Masked Language Models makes them a natural choice for generating plausible distractors for human cloze questions. However, distractors must also be both distinct and incorrect, neither of which is directly addressed by existing neural methods. Evaluation of recent models has also relied largely on automated metrics, which cannot demonstrate the reliability or validity of human comprehension tests. In this work, we first formulate the pedagogically motivated objectives of plausibility, incorrectness, and distinctiveness in terms of conditional distributions from language models. Second, we present an unsupervised, interpretable method that uses these objectives to jointly optimize sets of distractors. Third, we test the reliability and validity of the resulting cloze tests compared to other methods with human participants. We find our method has stronger correlation with teacher-created comprehension tests than the state-of-the-art neural method and is more internally consistent. Our implementation is freely available and can quickly create a multiple choice cloze test from any given passage.

摘要

掩码语言模型的完形填空训练目标使其成为为人类完形填空问题生成合理干扰项的自然选择。然而，干扰项还必须既独特又错误，而现有神经方法均未直接解决这两个问题。对近期模型的评估也在很大程度上依赖于自动化指标，而这些指标无法证明人类理解测试的可靠性或有效性。在这项工作中，我们首先根据语言模型的条件分布，阐述了在合理性、错误性和独特性方面具有教学动机的目标。其次，我们提出了一种无监督的、可解释的方法，该方法使用这些目标来联合优化干扰项集。第三，我们与其他针对人类参与者的方法相比，测试了由此产生的完形填空测试的可靠性和有效性。我们发现，与最先进的神经方法相比，我们的方法与教师创建的理解测试具有更强的相关性，并且内部一致性更高。我们的实现是免费提供的，并且可以从任何给定的段落快速创建多项选择完形填空测试。

相似文献

1

Pedagogically Aligned Objectives Create Reliable Automatic Cloze Tests.教学目标一致可创建可靠的自动完形测试。

Proc Conf. 2024 Jun;2024:3961-3972. doi: 10.18653/v1/2024.naacl-long.220.

2

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

3

Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义

APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.

4

Short-Term Memory Impairment短期记忆障碍

5

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

6

Sexual Harassment and Prevention Training性骚扰与预防培训

7

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

8

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

9

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理：一项网络荟萃分析。

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

10

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状荟萃分析。

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

本文引用的文献

1

A dataset for medical instructional video classification and question answering.用于医学教学视频分类和问答的数据集。

Sci Data. 2023 Mar 22;10(1):158. doi: 10.1038/s41597-023-02036-y.

2

User's guide to correlation coefficients.相关系数用户指南。

Turk J Emerg Med. 2018 Aug 7;18(3):91-93. doi: 10.1016/j.tjem.2018.08.001. eCollection 2018 Sep.

3

Canadian adaptation of the Newest Vital Sign©, a health literacy assessment tool.加拿大版《最新生命体征》（Newest Vital Sign©），一种健康素养评估工具。

Public Health Nutr. 2018 Aug;21(11):2038-2045. doi: 10.1017/S1368980018000253. Epub 2018 Apr 25.

4

Guidelines based on validity criteria for the development of multiple choice items.基于多项选择题编制有效性标准的指南。

Psicothema. 2015;27(4):388-94. doi: 10.7334/psicothema2015.110.

5

Quick assessment of literacy in primary care: the newest vital sign.基层医疗中识字能力的快速评估：最新的生命体征。

Ann Fam Med. 2005 Nov-Dec;3(6):514-22. doi: 10.1370/afm.405.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验