Durojaye Cecilia, Fink Lauren, Roeske Tina, Wald-Fuhrmann Melanie, Larrouy-Maestri Pauline
Department of Music, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
Department of Psychology, Arizona State University, Tempe, AZ, United States.
Front Psychol. 2021 May 20;12:652673. doi: 10.3389/fpsyg.2021.652673. eCollection 2021.
It seems trivial to identify sound sequences as music or speech, particularly when the sequences come from different sound sources, such as an orchestra and a human voice. Can we also easily distinguish these categories when the sequence comes from the same sound source? On the basis of which acoustic features? We investigated these questions by examining listeners' classification of sound sequences performed by an instrument intertwining both speech and music: the dùndún talking drum. The dùndún is commonly used in south-west Nigeria as a musical instrument but is also perfectly fit for linguistic usage in what has been described as speech surrogates in Africa. One hundred seven participants from diverse geographical locations (15 different mother tongues represented) took part in an online experiment. Fifty-one participants reported being familiar with the dùndún talking drum, 55% of those being speakers of Yorùbá. During the experiment, participants listened to 30 dùndún samples of about 7s long, performed either as music or Yorùbá speech surrogate ( = 15 each) by a professional musician, and were asked to classify each sample as music or speech-like. The classification task revealed the ability of the listeners to identify the samples as intended by the performer, particularly when they were familiar with the dùndún, though even unfamiliar participants performed above chance. A logistic regression predicting participants' classification of the samples from several acoustic features confirmed the perceptual relevance of intensity, pitch, timbre, and timing measures and their interaction with listener familiarity. In all, this study provides empirical evidence supporting the discriminating role of acoustic features and the modulatory role of familiarity in teasing apart speech and music.
将声音序列识别为音乐或语音似乎轻而易举,尤其是当这些序列来自不同的声源时,比如管弦乐队和人声。当序列来自同一声源时,我们是否也能轻松区分这些类别呢?基于哪些声学特征呢?我们通过研究听众对一种融合了语音和音乐的乐器——敦敦(dùndún)会说话的鼓所演奏的声音序列的分类,来探讨这些问题。敦敦在尼日利亚西南部通常作为一种乐器使用,但在非洲被描述为语音替代物的语言使用中也非常适用。来自不同地理位置(代表15种不同母语)的107名参与者参加了一项在线实验。51名参与者报告熟悉敦敦会说话的鼓,其中55%是约鲁巴语使用者。在实验过程中,参与者聆听了30个时长约7秒的敦敦样本,这些样本由一名专业音乐家分别演奏成音乐或约鲁巴语语音替代物(各15个),并被要求将每个样本分类为音乐或类似语音。分类任务表明,听众能够按照演奏者的意图识别样本,尤其是当他们熟悉敦敦时,不过即使是不熟悉的参与者表现也高于随机水平。一项根据多种声学特征预测参与者对样本分类的逻辑回归证实了强度、音高、音色和时间测量的感知相关性,以及它们与听众熟悉程度的相互作用。总之,这项研究提供了实证证据,支持声学特征的辨别作用以及熟悉程度在区分语音和音乐方面的调节作用。