Computer Science Department, Stanford University, Stanford, CA 94305;
Computer Science Department, Stanford University, Stanford, CA 94305.
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.
This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
本文探讨了通过自监督训练的大型人工神经网络所学到的语言结构知识,模型只需尝试预测给定上下文中的掩蔽词。人类语言交流是通过单词序列进行的,但语言理解需要构建从未明确观察到的丰富层次结构。这些机制一直是人类语言习得的主要谜团,而工程工作主要通过对句子进行监督学习来完成,这些句子是为了这种潜在结构而人工标记的。然而,我们证明,现代深度上下文语言模型可以在没有任何显式监督的情况下学习这种结构的主要方面。我们开发了用于识别人工神经网络中出现的语言层次结构的方法,并证明这些模型中的组件专注于句法语法关系和照应核心参考。事实上,我们表明,这些模型中学习的嵌入的线性变换可以令人惊讶地捕捉到解析树距离,从而允许对通常由语言学家假设的句子树结构进行近似重建。这些结果有助于解释为什么这些模型在许多语言理解任务中带来了如此大的改进。