Ondov Brian, Demner-Fushman Dina, Attal Kush
National Library of Medicine, Bethesda, MD, USA.
NYU Grossman School of Medicine, New York, NY, USA.
Proc Conf. 2024 Jun;2024:3961-3972. doi: 10.18653/v1/2024.naacl-long.220.
The cloze training objective of Masked Language Models makes them a natural choice for generating plausible distractors for human cloze questions. However, distractors must also be both distinct and incorrect, neither of which is directly addressed by existing neural methods. Evaluation of recent models has also relied largely on automated metrics, which cannot demonstrate the reliability or validity of human comprehension tests. In this work, we first formulate the pedagogically motivated objectives of plausibility, incorrectness, and distinctiveness in terms of conditional distributions from language models. Second, we present an unsupervised, interpretable method that uses these objectives to jointly optimize sets of distractors. Third, we test the reliability and validity of the resulting cloze tests compared to other methods with human participants. We find our method has stronger correlation with teacher-created comprehension tests than the state-of-the-art neural method and is more internally consistent. Our implementation is freely available and can quickly create a multiple choice cloze test from any given passage.
掩码语言模型的完形填空训练目标使其成为为人类完形填空问题生成合理干扰项的自然选择。然而,干扰项还必须既独特又错误,而现有神经方法均未直接解决这两个问题。对近期模型的评估也在很大程度上依赖于自动化指标,而这些指标无法证明人类理解测试的可靠性或有效性。在这项工作中,我们首先根据语言模型的条件分布,阐述了在合理性、错误性和独特性方面具有教学动机的目标。其次,我们提出了一种无监督的、可解释的方法,该方法使用这些目标来联合优化干扰项集。第三,我们与其他针对人类参与者的方法相比,测试了由此产生的完形填空测试的可靠性和有效性。我们发现,与最先进的神经方法相比,我们的方法与教师创建的理解测试具有更强的相关性,并且内部一致性更高。我们的实现是免费提供的,并且可以从任何给定的段落快速创建多项选择完形填空测试。