Betz Gregor, Richardson Kyle
Karlsruhe Institute of Technology, Department of Philosophy, Karlsruhe, Germany.
Allen Institute for Artificial Intelligence, Aristo, Seattle, WA, United States.
Front Artif Intell. 2022 Oct 18;5:900943. doi: 10.3389/frai.2022.900943. eCollection 2022.
Neural language models (NLMs) are susceptible to producing inconsistent output. This paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train NLMs on synthetic text corpora that are created by simulating text production in a society. For diagnostic purposes, we explicitly model the individual belief systems of artificial agents (authors) who produce corpus texts. NLMs, trained on those texts, can be shown to aggregate the judgments of individual authors during pre-training according to sentence-wise vote ratios (roughly, reporting frequencies), which inevitably leads to so-called discursive dilemmas: aggregate judgments are inconsistent even though all individual belief states are consistent. As a remedy for such inconsistencies, we develop a self-training procedure-inspired by the concept of reflective equilibrium-that effectively reduces the extent of logical incoherence in a model's belief system, corrects global mis-confidence, and eventually allows the model to settle on a new, epistemically superior belief state. Thus, social choice theory helps to understand why NLMs are prone to produce inconsistencies; epistemology suggests how to get rid of them.
神经语言模型(NLMs)容易产生不一致的输出。本文提出了一种针对NLMs不连贯性的新诊断方法以及一种新颖的补救措施。我们在通过模拟社会中的文本生成创建的合成文本语料库上训练NLMs。出于诊断目的,我们明确地对生成语料库文本的人工智能体(作者)的个体信念系统进行建模。可以证明,在这些文本上训练的NLMs在预训练期间会根据句子层面的投票比例(大致为报告频率)汇总个体作者的判断,这不可避免地导致所谓的话语困境:即使所有个体信念状态都是一致的,汇总判断也不一致。作为对这种不一致性的补救措施,我们开发了一种受反思平衡概念启发的自我训练程序,该程序有效地减少了模型信念系统中逻辑不连贯的程度,纠正了全局错误置信度,并最终使模型能够确定一种新的、认知上更优越的信念状态。因此,社会选择理论有助于理解为什么NLMs容易产生不一致性;认识论则建议如何消除这些不一致性。