• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CiwGAN 和 fiwGAN:利用生成对抗网络将声学数据中的信息编码,以建模词汇学习。

CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks.

机构信息

Department of Linguistics, University of California, Berkeley, United States of America.

出版信息

Neural Netw. 2021 Jul;139:305-325. doi: 10.1016/j.neunet.2021.03.017. Epub 2021 Mar 19.

DOI:10.1016/j.neunet.2021.03.017
PMID:33873122
Abstract

How can deep neural networks encode information that corresponds to words in human speech into raw acoustic data? This paper proposes two neural network architectures for modeling unsupervised lexical learning from raw acoustic inputs: ciwGAN (Categorical InfoWaveGAN) and fiwGAN (Featural InfoWaveGAN). These combine Deep Convolutional GAN architecture for audio data (WaveGAN; Donahue et al., 2019) with the information theoretic extension of GAN - InfoGAN (Chen et al., 2016) - and propose a new latent space structure that can model featural learning simultaneously with a higher level classification and allows for a very low-dimension vector representation of lexical items. In addition to the Generator and Discriminator networks, the architectures introduce a network that learns to retrieve latent codes from generated audio outputs. Lexical learning is thus modeled as emergent from an architecture that forces a deep neural network to output data such that unique information is retrievable from its acoustic outputs. The networks trained on lexical items from the TIMIT corpus learn to encode unique information corresponding to lexical items in the form of categorical variables in their latent space. By manipulating these variables, the network outputs specific lexical items. The network occasionally outputs innovative lexical items that violate training data, but are linguistically interpretable and highly informative for cognitive modeling and neural network interpretability. Innovative outputs suggest that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech: a fiwGAN network trained on suit and dark outputs innovative start, even though it never saw start or even a [st] sequence in the training data. We also argue that setting latent featural codes to values well beyond training range results in almost categorical generation of prototypical lexical items and reveals underlying values of each latent code. Probing deep neural networks trained on well understood dependencies in speech bears implications for latent space interpretability and understanding how deep neural networks learn meaningful representations, as well as potential for unsupervised text-to-speech generation in the GAN framework.

摘要

深度神经网络如何将对应于人类语音中单词的信息编码为原始声学数据?本文提出了两种用于从原始声学输入中对无监督词汇学习进行建模的神经网络架构:ciwGAN(分类信息波 GAN)和 fiwGAN(特征信息波 GAN)。这些架构将用于音频数据的深度卷积 GAN 架构(WaveGAN;Donahue 等人,2019 年)与 GAN 的信息论扩展 - InfoGAN(Chen 等人,2016 年)相结合,并提出了一种新的潜在空间结构,该结构可以同时对特征学习进行建模,并允许对词汇项进行非常低维的向量表示。除了生成器和判别器网络外,该架构还引入了一个从生成音频输出中检索潜在代码的网络。因此,词汇学习被建模为从一个迫使深度神经网络输出数据的架构中涌现出来的,使得从其声学输出中可以检索到独特的信息。在 TIMIT 语料库中的词汇项上训练的网络学会以其潜在空间中分类变量的形式对对应于词汇项的独特信息进行编码。通过操纵这些变量,网络输出特定的词汇项。网络偶尔会输出违反训练数据的创新词汇项,但对于认知建模和神经网络可解释性来说,这些词汇项具有语言学可解释性且信息量丰富。创新的输出表明,网络学习的语音和音系表示可以被创造性地重新组合,并直接与人类语音中的创造性相媲美:在训练数据中从未见过 start 甚至 [st] 序列的情况下,基于 suit 和 dark 输出训练的 fiwGAN 网络生成了 innovative start。我们还认为,将潜在特征代码设置为远远超出训练范围的值几乎会导致原型词汇项的分类生成,并揭示每个潜在代码的基础值。对基于语音中理解良好的依赖关系进行训练的深度神经网络进行探测,对潜在空间可解释性以及理解深度神经网络如何学习有意义的表示具有重要意义,并且在 GAN 框架中具有潜在的无监督文本到语音生成能力。

相似文献

1
CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks.CiwGAN 和 fiwGAN:利用生成对抗网络将声学数据中的信息编码,以建模词汇学习。
Neural Netw. 2021 Jul;139:305-325. doi: 10.1016/j.neunet.2021.03.017. Epub 2021 Mar 19.
2
Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks.生成对抗语音学:使用神经网络对无监督语音和音系学习进行建模
Front Artif Intell. 2020 Jul 8;3:44. doi: 10.3389/frai.2020.00044. eCollection 2020.
3
Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition.用于情感识别的声学和词汇特征的情感潜在表示。
Sensors (Basel). 2020 May 4;20(9):2614. doi: 10.3390/s20092614.
4
Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks.基于音频的无人机检测与识别:深度学习技术与生成对抗网络增强数据集
Sensors (Basel). 2021 Jul 21;21(15):4953. doi: 10.3390/s21154953.
5
Information-Based Boundary Equilibrium Generative Adversarial Networks with Interpretable Representation Learning.基于信息的边界平衡生成对抗网络与可解释的表示学习。
Comput Intell Neurosci. 2018 Oct 17;2018:6465949. doi: 10.1155/2018/6465949. eCollection 2018.
6
Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。
J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.
7
Improving Speech Emotion Recognition With Adversarial Data Augmentation Network.利用对抗性数据增强网络提高语音情感识别能力。
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):172-184. doi: 10.1109/TNNLS.2020.3027600. Epub 2022 Jan 5.
8
μ-law SGAN for generating spectra with more details in speech enhancement.μ 律 SGAN 用于语音增强中生成具有更多细节的频谱。
Neural Netw. 2021 Apr;136:17-27. doi: 10.1016/j.neunet.2020.12.017. Epub 2020 Dec 25.
9
Deep Convolutional Generative Adversarial Network (dcGAN) Models for Screening and Design of Small Molecules Targeting Cannabinoid Receptors.用于筛选和设计大麻素受体小分子的深度卷积生成对抗网络 (dcGAN) 模型。
Mol Pharm. 2019 Nov 4;16(11):4451-4460. doi: 10.1021/acs.molpharmaceut.9b00500. Epub 2019 Oct 24.
10
Generative adversarial networks with decoder-encoder output noises.生成对抗网络与解码器编码器输出噪声。
Neural Netw. 2020 Jul;127:19-28. doi: 10.1016/j.neunet.2020.04.005. Epub 2020 Apr 9.

引用本文的文献

1
Dissociating language and thought in large language models.大语言模型中的语言与思维分离。
Trends Cogn Sci. 2024 Jun;28(6):517-540. doi: 10.1016/j.tics.2024.01.011. Epub 2024 Mar 19.
2
Encoding of speech in convolutional layers and the brain stem based on language experience.基于语言经验的卷积层和脑干中的语音编码。
Sci Rep. 2023 Apr 20;13(1):6480. doi: 10.1038/s41598-023-33384-9.
3
Toward understanding the communication in sperm whales.迈向理解抹香鲸的交流方式。
iScience. 2022 May 13;25(6):104393. doi: 10.1016/j.isci.2022.104393. eCollection 2022 Jun 17.