利用用于语音的深度神经网络解析人类听觉通路中的神经计算。

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

机构信息

Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.

School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.

出版信息

Nat Neurosci. 2023 Dec;26(12):2213-2225. doi: 10.1038/s41593-023-01468-4. Epub 2023 Oct 30.

DOI:10.1038/s41593-023-01468-4

PMID:37904043

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10689246/

Abstract

The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex.

摘要

人类听觉系统从语音信号中提取丰富的语言抽象信息。传统的理解这一复杂过程的方法使用了线性特征编码模型，但效果有限。人工神经网络在语音识别任务中表现出色，并提供了有前途的语音处理计算模型。我们使用最先进的深度神经网络 (DNN) 模型中的语音表示来研究从听神经到言语皮层的神经编码。DNN 分层中的表示与整个上行听觉系统中的神经活动密切相关。无监督的语音模型的表现至少与其他纯监督或微调模型一样好。更深的 DNN 层与高级听觉皮层中的神经活动相关性更好，其计算与语音中的音位和音节结构一致。因此，在英语或普通话上训练的 DNN 模型可以预测每种语言的母语者的皮质反应。这些结果揭示了 DNN 模型表示与生物听觉通路之间的趋同，为在听觉皮层中对神经编码进行建模提供了新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d8d/10689246/b44475f7d16a/41593_2023_1468_Fig1_HTML.jpg

相似文献

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

Nat Neurosci. 2023 Dec;26(12):2213-2225. doi: 10.1038/s41593-023-01468-4. Epub 2023 Oct 30.

Predicting EEG Responses to Attended Speech via Deep Neural Networks for Speech.

Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340027.

Human cortical encoding of pitch in tonal and non-tonal languages.

Nat Commun. 2021 Feb 19;12(1):1161. doi: 10.1038/s41467-021-21430-x.

Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.

J Neurosci. 2017 Aug 16;37(33):7906-7920. doi: 10.1523/JNEUROSCI.0238-17.2017. Epub 2017 Jul 17.

Musical training orchestrates coordinated neuroplasticity in auditory brainstem and cortex to counteract age-related declines in categorical vowel perception.

J Neurosci. 2015 Jan 21;35(3):1240-9. doi: 10.1523/JNEUROSCI.3292-14.2015.

Cortical Representations of Speech in a Multitalker Auditory Scene.

J Neurosci. 2017 Sep 20;37(38):9189-9196. doi: 10.1523/JNEUROSCI.0938-17.2017. Epub 2017 Aug 18.

Inferring Mechanisms of Auditory Attentional Modulation with Deep Neural Networks.

Neural Comput. 2022 Oct 7;34(11):2273-2293. doi: 10.1162/neco_a_01537.

Towards reconstructing intelligible speech from the human auditory cortex.

Sci Rep. 2019 Jan 29;9(1):874. doi: 10.1038/s41598-018-37359-z.

Coordinated plasticity in brainstem and auditory cortex contributes to enhanced categorical speech perception in musicians.

Eur J Neurosci. 2014 Aug;40(4):2662-73. doi: 10.1111/ejn.12627. Epub 2014 Jun 2.

Distinct roles of delta- and theta-band neural tracking for sharpening and predictive coding of multi-level speech features during spoken language processing.

Hum Brain Mapp. 2023 Dec 1;44(17):6149-6172. doi: 10.1002/hbm.26503. Epub 2023 Oct 11.

引用本文的文献

Temporal integration in human auditory cortex is predominantly yoked to absolute time.

Nat Neurosci. 2025 Sep 18. doi: 10.1038/s41593-025-02060-8.

The detection of algebraic auditory structures emerges with self-supervised learning.

PLoS Comput Biol. 2025 Sep 5;21(9):e1013271. doi: 10.1371/journal.pcbi.1013271. eCollection 2025 Sep.

A Deep Neural Network Trained on Congruent Audiovisual Speech Reports the McGurk Effect.

bioRxiv. 2025 Aug 24:2025.08.20.671347. doi: 10.1101/2025.08.20.671347.

Deep neural networks explain spiking activity in auditory cortex.

PLoS Comput Biol. 2025 Aug 25;21(8):e1013334. doi: 10.1371/journal.pcbi.1013334. eCollection 2025 Aug.

Recurrent neural networks as neuro-computational models of human speech recognition.

PLoS Comput Biol. 2025 Jul 28;21(7):e1013244. doi: 10.1371/journal.pcbi.1013244. eCollection 2025 Jul.

Intrinsic dynamic shapes responses to external stimulation in the human brain.

Elife. 2025 Jul 3;14:RP104996. doi: 10.7554/eLife.104996.

Advances in functional magnetic resonance imaging-based brain function mapping: a deep learning perspective.

Psychoradiology. 2025 Apr 29;5:kkaf007. doi: 10.1093/psyrad/kkaf007. eCollection 2025.

Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species.

Elife. 2025 May 6;13:RP95709. doi: 10.7554/eLife.95709.

A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex.

Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2412243122. doi: 10.1073/pnas.2412243122. Epub 2025 Apr 28.

The role of musical aspects of language in human cognition.

Front Psychol. 2025 Mar 21;16:1505694. doi: 10.3389/fpsyg.2025.1505694. eCollection 2025.

本文引用的文献

Latent neural dynamics encode temporal context in speech.

Hear Res. 2023 Sep 15;437:108838. doi: 10.1016/j.heares.2023.108838. Epub 2023 Jul 4.

Shared computational principles for language processing in humans and deep language models.

Nat Neurosci. 2022 Mar;25(3):369-380. doi: 10.1038/s41593-022-01026-4. Epub 2022 Mar 7.

The neural architecture of language: Integrative modeling converges on predictive processing.

Proc Natl Acad Sci U S A. 2021 Nov 9;118(45). doi: 10.1073/pnas.2105646118.

Parallel and distributed encoding of speech across human auditory cortex.

Cell. 2021 Sep 2;184(18):4626-4639.e13. doi: 10.1016/j.cell.2021.07.019. Epub 2021 Aug 18.

On the relationship between maps and domains in inferotemporal cortex.

Nat Rev Neurosci. 2021 Sep;22(9):573-583. doi: 10.1038/s41583-021-00490-4. Epub 2021 Aug 3.

Human cortical encoding of pitch in tonal and non-tonal languages.

Nat Commun. 2021 Feb 19;12(1):1161. doi: 10.1038/s41467-021-21430-x.

Unsupervised neural network models of the ventral visual stream.

Proc Natl Acad Sci U S A. 2021 Jan 19;118(3). doi: 10.1073/pnas.2014196118.

Single-cell activity in human STG during perception of phonemes is organized according to manner of articulation.

Neuroimage. 2021 Feb 1;226:117499. doi: 10.1016/j.neuroimage.2020.117499. Epub 2020 Oct 24.

Brain-optimized extraction of complex sound features that drive continuous auditory perception.

PLoS Comput Biol. 2020 Jul 2;16(7):e1007992. doi: 10.1371/journal.pcbi.1007992. eCollection 2020 Jul.

Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models.

Elife. 2020 Jun 26;9:e53445. doi: 10.7554/eLife.53445.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用用于语音的深度神经网络解析人类听觉通路中的神经计算。

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

机构信息

Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.

School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.

出版信息

Nat Neurosci. 2023 Dec;26(12):2213-2225. doi: 10.1038/s41593-023-01468-4. Epub 2023 Oct 30.

DOI:10.1038/s41593-023-01468-4

PMID:37904043

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10689246/

Abstract

摘要

利用用于语音的深度神经网络解析人类听觉通路中的神经计算。

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用用于语音的深度神经网络解析人类听觉通路中的神经计算。

Dissecting neural computations in the human auditory pathway using deep neural networks for speech.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献