Laboratory of Neuropsychology National Institute of Mental Health, National Institutes of Health Bethesda, MD 20892, USA.
R Soc Open Sci. 2015 Dec 23;2(12):150432. doi: 10.1098/rsos.150432. eCollection 2015 Dec.
Individual primates can be identified by the sound of their voice. Macaques have demonstrated an ability to discern conspecific identity from a harmonically structured 'coo' call. Voice recognition presumably requires the integrated perception of multiple acoustic features. However, it is unclear how this is achieved, given considerable variability across utterances. Specifically, the extent to which information about caller identity is distributed across multiple features remains elusive. We examined these issues by recording and analysing a large sample of calls from eight macaques. Single acoustic features, including fundamental frequency, duration and Weiner entropy, were informative but unreliable for the statistical classification of caller identity. A combination of multiple features, however, allowed for highly accurate caller identification. A regularized classifier that learned to identify callers from the modulation power spectrum of calls found that specific regions of spectral-temporal modulation were informative for caller identification. These ranges are related to acoustic features such as the call's fundamental frequency and FM sweep direction. We further found that the low-frequency spectrotemporal modulation component contained an indexical cue of the caller body size. Thus, cues for caller identity are distributed across identifiable spectrotemporal components corresponding to laryngeal and supralaryngeal components of vocalizations, and the integration of those cues can enable highly reliable caller identification. Our results demonstrate a clear acoustic basis by which individual macaque vocalizations can be recognized.
个体灵长类动物可以通过声音来识别。猕猴已经证明了从谐波结构的“咕咕”叫声中辨别同种身份的能力。语音识别大概需要对多个声学特征进行综合感知。然而,鉴于语音的变化很大,目前尚不清楚这是如何实现的。具体来说,关于呼叫者身份的信息在多大程度上分布在多个特征上仍然难以捉摸。我们通过记录和分析来自 8 只猕猴的大量叫声样本来研究这些问题。单个声学特征,包括基频、时长和 Wiener 熵,虽然对呼叫者身份的统计分类有一定的指示作用,但却不可靠。然而,多种特征的结合可以实现高度准确的呼叫者识别。一个从呼叫调制功率谱中学习识别呼叫者的正则化分类器发现,特定的谱时调制区域对呼叫者识别是有信息的。这些范围与声学特征有关,如呼叫的基频和 FM 扫描方向。我们进一步发现,低频谱时调制成分包含了呼叫者体型的指标线索。因此,呼叫者身份的线索分布在可识别的谱时成分中,这些成分对应于发声的喉部和喉上部分,并且这些线索的整合可以实现高度可靠的呼叫者识别。我们的结果表明,个体猕猴的叫声可以通过一个明确的声学基础来识别。