Cognitive and Information Sciences, University of California, Merced, 5200 North Lake Rd., Merced, CA 95343, USA
EuroMov Laboratory, Université de Montpellier, 700 Avenue du Pic Saint-Loup, 34090 Montpellier, France.
J R Soc Interface. 2017 Oct;14(135). doi: 10.1098/rsif.2017.0231.
Humans talk, sing and play music. Some species of birds and whales sing long and complex songs. All these behaviours and sounds exhibit hierarchical structure-syllables and notes are positioned within words and musical phrases, words and motives in sentences and musical phrases, and so on. We developed a new method to measure and compare hierarchical temporal structures in speech, song and music. The method identifies temporal events as peaks in the sound amplitude envelope, and quantifies event clustering across a range of timescales using Allan factor (AF) variance. AF variances were analysed and compared for over 200 different recordings from more than 16 different categories of signals, including recordings of speech in different contexts and languages, musical compositions and performances from different genres. Non-human vocalizations from two bird species and two types of marine mammals were also analysed for comparison. The resulting patterns of AF variance across timescales were distinct to each of four natural categories of complex sound: speech, popular music, classical music and complex animal vocalizations. Comparisons within and across categories indicated that nested clustering in longer timescales was more prominent when prosodic variation was greater, and when sounds came from interactions among individuals, including interactions between speakers, musicians, and even killer whales. Nested clustering also was more prominent for music compared with speech, and reflected beat structure for popular music and self-similarity across timescales for classical music. In summary, hierarchical temporal structures reflect the behavioural and social processes underlying complex vocalizations and musical performances.
人类会说话、唱歌和演奏音乐。一些鸟类和鲸鱼物种会唱出又长又复杂的歌曲。所有这些行为和声音都表现出层次结构——音节和音符位于单词和音乐短语内,单词和动机位于句子和音乐短语内,以此类推。我们开发了一种新的方法来测量和比较言语、歌曲和音乐中的层次时间结构。该方法将声音幅度包络中的峰值识别为时间事件,并使用 Allan 因子 (AF) 方差来量化跨越多个时间尺度的事件聚类。我们分析和比较了来自 16 个不同信号类别(包括不同语境和语言的言语录音、不同流派的音乐作品和表演)的 200 多个不同录音的 AF 方差。还分析了来自两种鸟类和两种类型的海洋哺乳动物的非人类发声,以便进行比较。跨越时间尺度的 AF 方差模式与言语、流行音乐、古典音乐和复杂动物发声这四个复杂声音的自然类别中的每一个都截然不同。类别内和类别间的比较表明,当韵律变化较大、声音来自个体之间的相互作用(包括说话者、音乐家之间的相互作用,甚至虎鲸之间的相互作用)时,较长时间尺度上的嵌套聚类更为突出。与言语相比,音乐中的嵌套聚类更为突出,并且反映了流行音乐的节拍结构和古典音乐的自我相似性跨越时间尺度。总之,层次时间结构反映了复杂发声和音乐表演背后的行为和社会过程。