Suppr超能文献

生成对抗语音学:使用神经网络对无监督语音和音系学习进行建模

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks.

作者信息

Beguš Gašper

机构信息

Department of Linguistics, University of California, Berkeley, Berkeley, CA, United States.

Department of Linguistics, University of Washington, Seattle, WA, United States.

出版信息

Front Artif Intell. 2020 Jul 8;3:44. doi: 10.3389/frai.2020.00044. eCollection 2020.

Abstract

Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.

摘要

在语音数据中已被充分理解的依存关系上训练深度神经网络,可以为它们如何学习内部表征提供新的见解。本文认为,在生成对抗网络架构中,语音习得可以被建模为随机空间与生成的语音数据之间的依存关系,并提出了一种方法来揭示与语音和音系属性相对应的网络内部表征。生成对抗架构特别适合对语音和音系学习进行建模,因为该网络是在未标注的原始声学数据上进行训练的,且学习是无监督的,没有任何特定语言的假设或预先假定的抽象层次。一个生成对抗网络在英语的音位变体分布上进行了训练,在这种分布中,清塞音在重读音节前的单词开头时会送气,除非前面有擦音[s]。该网络成功地学习了音位变体交替:网络生成的语音信号包含送气时长的条件分布。本文提出了一种用于建立网络内部表征的技术,该技术可以识别与例如[s]的存在及其频谱特性相对应的潜在变量。通过操纵这些变量,我们可以主动控制生成输出中[s]的存在及其摩擦幅度。这表明该网络学会了使用潜在变量作为语音和音系表征的近似。至关重要的是,我们观察到在训练中学习到的依存关系超出了训练区间,这使得对学习表征的进一步探索成为可能。本文还讨论了网络架构和创新输出在语言习得、言语障碍和言语错误方面与语言行为的异同之处,并讨论了语音数据中已被充分理解的依存关系如何帮助我们解释神经网络如何学习它们的表征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9773/7861218/8cb2bc053be4/frai-03-00044-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验