• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

理解深度语音识别系统中的自适应多尺度时间整合

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.

作者信息

Keshishian Menoua, Norman-Haignere Sam V, Mesgarani Nima

机构信息

Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027.

出版信息

Adv Neural Inf Process Syst. 2021 Dec;34:24455-24467.

PMID:38737583
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11087060/
Abstract

Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.

摘要

诸如语音之类的自然信号在许多不同的时间尺度上具有层次结构,范围从数十毫秒(例如音素)到数百毫秒(例如单词),每个时间尺度都高度可变且依赖于上下文。虽然深度神经网络(DNN)在从自然信号中识别复杂模式方面表现出色,但对于DNN如何在多个时间尺度上灵活整合却知之甚少。在这里,我们展示了一种最近开发的用于研究生物神经系统中时间整合的方法——时间上下文不变性(TCI)范式——如何用于理解DNN中的时间整合。该方法很简单:我们测量对在两种不同上下文中呈现的大量刺激片段的反应,并估计实现上下文不变反应所需的最小片段持续时间。我们应用我们的方法来理解流行的DeepSpeech2模型如何学习在语音中跨时间进行整合。我们发现,几乎所有模型单元,即使是循环层中的单元,都有一个紧凑的整合窗口,在该窗口内刺激会显著改变反应,而在窗口外刺激几乎没有影响。我们表明,训练会导致这些整合窗口在早期层收缩,在较高层扩展,从而在整个网络中创建一个整合窗口层次结构。此外,通过测量时间拉伸/压缩语音的整合窗口,我们揭示了一个过渡点,在训练网络的中途,整合窗口与刺激结构(例如音素或单词)的持续时间而不是绝对时间相关联。在纯循环网络和纯卷积网络中也观察到了类似现象,尽管结构关联整合在循环网络中更为突出。这些发现表明,深度语音识别系统使用一种共同的模式来编码语音的层次结构:在早期层通过短的、时间关联的窗口进行整合,在后期层通过长的、结构关联的窗口进行整合。我们的方法为理解黑箱机器学习模型中的时间整合提供了一个直接且通用的工具包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/9077ccc896cb/nihms-1849108-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/2d8a6d345557/nihms-1849108-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/d0ce9300d7e0/nihms-1849108-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/255cf5b20125/nihms-1849108-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/8f270ec50d61/nihms-1849108-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/bfd3060fec1b/nihms-1849108-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/f66147d8bdbe/nihms-1849108-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/475cefb1b664/nihms-1849108-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/336bc68aa929/nihms-1849108-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/0c02660db178/nihms-1849108-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/d8a3985bbe81/nihms-1849108-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/004511a2c3c4/nihms-1849108-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/0241e50a83f6/nihms-1849108-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/9077ccc896cb/nihms-1849108-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/2d8a6d345557/nihms-1849108-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/d0ce9300d7e0/nihms-1849108-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/255cf5b20125/nihms-1849108-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/8f270ec50d61/nihms-1849108-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/bfd3060fec1b/nihms-1849108-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/f66147d8bdbe/nihms-1849108-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/475cefb1b664/nihms-1849108-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/336bc68aa929/nihms-1849108-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/0c02660db178/nihms-1849108-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/d8a3985bbe81/nihms-1849108-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/004511a2c3c4/nihms-1849108-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/0241e50a83f6/nihms-1849108-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ff/11087060/9077ccc896cb/nihms-1849108-f0005.jpg

相似文献

1
Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.理解深度语音识别系统中的自适应多尺度时间整合
Adv Neural Inf Process Syst. 2021 Dec;34:24455-24467.
2
Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows.大语言模型从整合跨越位置关联的指数窗口过渡到结构关联的幂律窗口。
Adv Neural Inf Process Syst. 2023 Dec;36:638-654.
3
Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration.人类听觉皮层中的时间整合主要与绝对时间相关,而非结构时长。
bioRxiv. 2024 Sep 24:2024.09.23.614358. doi: 10.1101/2024.09.23.614358.
4
Multiscale temporal integration organizes hierarchical computation in human auditory cortex.多尺度时间整合在人类听觉皮层中组织分层计算。
Nat Hum Behav. 2022 Mar;6(3):455-469. doi: 10.1038/s41562-021-01261-y. Epub 2022 Feb 10.
5
Human EEG and Recurrent Neural Networks Exhibit Common Temporal Dynamics During Speech Recognition.人类脑电图与循环神经网络在语音识别过程中呈现出共同的时间动态。
Front Syst Neurosci. 2021 Jul 8;15:617605. doi: 10.3389/fnsys.2021.617605. eCollection 2021.
6
Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。
Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.
7
Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music.音乐专业知识塑造了言语、正弦言语和音乐的视听时间整合窗口。
Front Psychol. 2014 Aug 7;5:868. doi: 10.3389/fpsyg.2014.00868. eCollection 2014.
8
Performance of a Computational Model of the Mammalian Olfactory System哺乳动物嗅觉系统计算模型的性能
9
Integrating speech in time depends on temporal expectancies and attention.语音整合与时序预期和注意力有关。
Cortex. 2017 Aug;93:28-40. doi: 10.1016/j.cortex.2017.05.001. Epub 2017 May 19.
10
Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks.分析无监督深度神经网络中音素类别的分布学习
Cogsci. 2016 Aug;2016:1757-1762.

引用本文的文献

1
Neurons in auditory cortex integrate information within constrained temporal windows that are invariant to the stimulus context and information rate.听觉皮层中的神经元在受限的时间窗口内整合信息,这些时间窗口对刺激背景和信息速率具有不变性。
bioRxiv. 2025 Feb 14:2025.02.14.637944. doi: 10.1101/2025.02.14.637944.
2
Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems.人类听觉皮层和循环自动语音识别系统中语言表征的并行分层编码。
bioRxiv. 2025 Feb 1:2025.01.30.635775. doi: 10.1101/2025.01.30.635775.
3
Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration.

本文引用的文献

1
Multiscale temporal integration organizes hierarchical computation in human auditory cortex.多尺度时间整合在人类听觉皮层中组织分层计算。
Nat Hum Behav. 2022 Mar;6(3):455-469. doi: 10.1038/s41562-021-01261-y. Epub 2022 Feb 10.
2
Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models.用深度神经网络模型估计和解释感觉神经反应的非线性感受野。
Elife. 2020 Jun 26;9:e53445. doi: 10.7554/eLife.53445.
3
Spiking network optimized for word recognition in noise predicts auditory system hierarchy.
人类听觉皮层中的时间整合主要与绝对时间相关,而非结构时长。
bioRxiv. 2024 Sep 24:2024.09.23.614358. doi: 10.1101/2024.09.23.614358.
4
Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows.大语言模型从整合跨越位置关联的指数窗口过渡到结构关联的幂律窗口。
Adv Neural Inf Process Syst. 2023 Dec;36:638-654.
用于噪声中单词识别的尖峰网络预测听觉系统层级。
PLoS Comput Biol. 2020 Jun 19;16(6):e1007558. doi: 10.1371/journal.pcbi.1007558. eCollection 2020 Jun.
4
Constructing and Forgetting Temporal Context in the Human Cerebral Cortex.在人类大脑皮层中构建和遗忘时间上下文。
Neuron. 2020 May 20;106(4):675-686.e11. doi: 10.1016/j.neuron.2020.02.013. Epub 2020 Mar 11.
5
Understanding the Representation and Computation of Multilayer Perceptrons: A Case Study in Speech Recognition.理解多层感知器的表示与计算:语音识别中的一个案例研究
Proc Mach Learn Res. 2017 Aug;70:2564-2573.
6
A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy.任务优化神经网络复制人类听觉行为,预测大脑反应,并揭示皮质处理层次结构。
Neuron. 2018 May 2;98(3):630-644.e16. doi: 10.1016/j.neuron.2018.03.044. Epub 2018 Apr 19.
7
On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.关于通过逐层相关性传播对非线性分类器决策进行逐像素解释
PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.
8
Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks.打开黑箱:高维递归神经网络中的低维动力学。
Neural Comput. 2013 Mar;25(3):626-49. doi: 10.1162/NECO_a_00409. Epub 2012 Dec 28.
9
A hierarchy of temporal receptive windows in human cortex.人类皮层中时间感受窗口的层次结构。
J Neurosci. 2008 Mar 5;28(10):2539-50. doi: 10.1523/JNEUROSCI.5487-07.2008.
10
The cortical organization of speech processing.言语处理的皮质组织。
Nat Rev Neurosci. 2007 May;8(5):393-402. doi: 10.1038/nrn2113. Epub 2007 Apr 13.