Department of Bioengineering, University of Pittsburgh, Pittsburgh, 15213, PA, USA.
Department of Neurobiology, University of Pittsburgh, Pittsburgh, 15213, PA, USA.
Nat Commun. 2019 Mar 21;10(1):1302. doi: 10.1038/s41467-019-09115-y.
Humans and vocal animals use vocalizations to communicate with members of their species. A necessary function of auditory perception is to generalize across the high variability inherent in vocalization production and classify them into behaviorally distinct categories ('words' or 'call types'). Here, we demonstrate that detecting mid-level features in calls achieves production-invariant classification. Starting from randomly chosen marmoset call features, we use a greedy search algorithm to determine the most informative and least redundant features necessary for call classification. High classification performance is achieved using only 10-20 features per call type. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses. This feature-based approach also succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Our results suggest that high-level neural representations of sounds are based on task-dependent features optimized for specific computational goals.
人类和发声动物利用发声来与同种成员进行交流。听觉感知的一个必要功能是对发声产生中的高度可变性进行概括,并将其分类为具有不同行为特征的类别(“单词”或“叫声类型”)。在这里,我们证明了在叫声中检测中间层特征可实现与产生无关的分类。从随机选择的狨猴叫声特征开始,我们使用贪婪搜索算法来确定对于叫声分类最有用且最不冗余的特征。每个叫声类型仅使用 10-20 个特征即可实现高分类性能。对候选特征选择神经元调谐特性的预测与一些观察到的听觉皮层反应非常吻合。这种基于特征的方法也适用于其他物种的叫声分类,以及其他复杂的分类任务,如呼叫者识别。我们的研究结果表明,声音的高级神经表示是基于针对特定计算目标优化的任务相关特征。