自我监督训练的人工神经网络中的紧急语言结构。

Emergent linguistic structure in artificial neural networks trained by self-supervision.

机构信息

Computer Science Department, Stanford University, Stanford, CA 94305;

Computer Science Department, Stanford University, Stanford, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.

DOI:10.1073/pnas.1907367117

PMID:32493748

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7720155/

Abstract

This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.

摘要

本文探讨了通过自监督训练的大型人工神经网络所学到的语言结构知识，模型只需尝试预测给定上下文中的掩蔽词。人类语言交流是通过单词序列进行的，但语言理解需要构建从未明确观察到的丰富层次结构。这些机制一直是人类语言习得的主要谜团，而工程工作主要通过对句子进行监督学习来完成，这些句子是为了这种潜在结构而人工标记的。然而，我们证明，现代深度上下文语言模型可以在没有任何显式监督的情况下学习这种结构的主要方面。我们开发了用于识别人工神经网络中出现的语言层次结构的方法，并证明这些模型中的组件专注于句法语法关系和照应核心参考。事实上，我们表明，这些模型中学习的嵌入的线性变换可以令人惊讶地捕捉到解析树距离，从而允许对通常由语言学家假设的句子树结构进行近似重建。这些结果有助于解释为什么这些模型在许多语言理解任务中带来了如此大的改进。

相似文献

Emergent linguistic structure in artificial neural networks trained by self-supervision.自我监督训练的人工神经网络中的紧急语言结构。

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054. doi: 10.1073/pnas.1907367117. Epub 2020 Jun 3.

Neural network processing of natural language: II. Towards a unified model of corticostriatal function in learning sentence comprehension and non-linguistic sequencing.自然语言的神经网络处理：II. 迈向学习句子理解和非语言序列中皮质纹状体功能的统一模型。

Brain Lang. 2009 May-Jun;109(2-3):80-92. doi: 10.1016/j.bandl.2008.08.002. Epub 2008 Oct 5.

Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network.递归神经网络中与语言构成相呼应的层次结构的出现。

Neural Netw. 2011 May;24(4):311-20. doi: 10.1016/j.neunet.2010.12.006. Epub 2011 Jan 12.

The semantics-syntax interface: Learning grammatical categories and hierarchical syntactic structure through semantics.语义-句法接口：通过语义学习语法类别和层次句法结构。

J Exp Psychol Learn Mem Cogn. 2021 Jul;47(7):1141-1155. doi: 10.1037/xlm0001044.

Mechanisms for handling nested dependencies in neural-network language models and humans.神经网络语言模型和人类处理嵌套依赖关系的机制。

Cognition. 2021 Aug;213:104699. doi: 10.1016/j.cognition.2021.104699. Epub 2021 Apr 30.

The logical syntax of number words: theory, acquisition and processing.数词的逻辑句法：理论、习得与加工

Cognition. 2009 Apr;111(1):24-45. doi: 10.1016/j.cognition.2008.12.008. Epub 2009 Feb 13.

Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech.结构化语义知识可以从预测儿童导向言语中的单词序列中自动浮现。

Front Psychol. 2018 Feb 22;9:133. doi: 10.3389/fpsyg.2018.00133. eCollection 2018.

Combined eye tracking and fMRI reveals neural basis of linguistic predictions during sentence comprehension.眼动追踪与功能磁共振成像相结合揭示句子理解过程中语言预测的神经基础。

Cortex. 2015 Jul;68:33-47. doi: 10.1016/j.cortex.2015.04.011. Epub 2015 Apr 27.

Linguistic generalization and compositionality in modern artificial neural networks.现代人工神经网络中的语言泛化和组合性。

Philos Trans R Soc Lond B Biol Sci. 2020 Feb 3;375(1791):20190307. doi: 10.1098/rstb.2019.0307. Epub 2019 Dec 16.

Age of acquisition effects differ across linguistic domains in sign language: EEG evidence.手语中语言领域的习得年龄效应不同：脑电图证据。

Brain Lang. 2020 Jan;200:104708. doi: 10.1016/j.bandl.2019.104708. Epub 2019 Nov 4.

引用本文的文献

Reduction of supervision for biomedical knowledge discovery.减少对生物医学知识发现的监督。

BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.

Reading comprehension in L1 and L2 readers: neurocomputational mechanisms revealed through large language models.第一语言和第二语言阅读者的阅读理解：通过大语言模型揭示的神经计算机制

NPJ Sci Learn. 2025 Jul 10;10(1):46. doi: 10.1038/s41539-025-00337-y.

A multimodal transformer-based tool for automatic generation of concreteness ratings across languages.一种基于多模态变压器的工具，用于自动生成跨语言的具体性评级。

Commun Psychol. 2025 Jul 8;3(1):100. doi: 10.1038/s44271-025-00280-z.

EMTeC: A corpus of eye movements on machine-generated texts.EMTeC：机器生成文本上的眼动语料库。

Behav Res Methods. 2025 Jun 3;57(7):189. doi: 10.3758/s13428-025-02677-4.

Leveraging pretrained deep protein language model to predict peptide collision cross section.利用预训练的深度蛋白质语言模型预测肽段的碰撞截面。

Commun Chem. 2025 May 6;8(1):137. doi: 10.1038/s42004-025-01540-z.

Natural language processing models reveal neural dynamics of human conversation.自然语言处理模型揭示了人类对话的神经动力学。

Nat Commun. 2025 Apr 9;16(1):3376. doi: 10.1038/s41467-025-58620-w.

Shortening Psychological Scales: Semantic Similarity Matters.缩短心理量表：语义相似性很重要。

Educ Psychol Meas. 2025 Feb 24:00131644251319047. doi: 10.1177/00131644251319047.

How Can Deep Neural Networks Inform Theory in Psychological Science?深度神经网络如何为心理学理论提供信息？

Curr Dir Psychol Sci. 2024 Oct;33(5):325-333. doi: 10.1177/09637214241268098. Epub 2024 Sep 11.

Pre-trained artificial intelligence language model represents pragmatic language variability central to autism and genetically related phenotypes.预训练人工智能语言模型代表了自闭症及基因相关表型中核心的语用语言变异性。

Autism. 2025 May;29(5):1346-1358. doi: 10.1177/13623613241304488. Epub 2024 Dec 20.

MacBehaviour: An R package for behavioural experimentation on large language models.MacBehaviour：一个用于对大语言模型进行行为实验的R软件包。

Behav Res Methods. 2024 Dec 18;57(1):19. doi: 10.3758/s13428-024-02524-y.

本文引用的文献

Poverty of the stimulus revisited.重新审视刺激不足问题。

Cogn Sci. 2011 Sep-Oct;35(7):1207-42. doi: 10.1111/j.1551-6709.2011.01189.x. Epub 2011 Aug 8.

Rethinking language: how probabilities shape the words we use.重新思考语言：概率如何塑造我们使用的词汇。

Proc Natl Acad Sci U S A. 2011 Mar 8;108(10):3825-6. doi: 10.1073/pnas.1100760108. Epub 2011 Feb 23.

Early language acquisition: cracking the speech code.早期语言习得：破解语音密码。

Nat Rev Neurosci. 2004 Nov;5(11):831-43. doi: 10.1038/nrn1533.

Perception viewed as an inverse problem.知觉被视为一个逆问题。

Vision Res. 2001 Nov;41(24):3145-61. doi: 10.1016/s0042-6989(01)00173-0.

Broken agreement.协议破裂。

Cogn Psychol. 1991 Jan;23(1):45-93. doi: 10.1016/0010-0285(91)90003-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。