Stilp Christian E, Theodore Rachel M
Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40292, USA.
Department of Speech, Language, and Hearing Sciences, University of Connecticut, 2 Alethia Drive, Unit 1085, Storrs, CT, 06269-1085, USA.
Atten Percept Psychophys. 2020 Jul;82(5):2237-2243. doi: 10.3758/s13414-020-01971-x.
Speech perception is challenged by indexical variability. A litany of studies on talker normalization have demonstrated that hearing multiple talkers incurs processing costs (e.g., lower accuracy, increased response time) compared to hearing a single talker. However, when reframing these studies in terms of stimulus structure, it is evident that past tests of multiple-talker (i.e., low structure) and single-talker (i.e., high structure) conditions are not representative of the graded nature of indexical variation in the environment. Here we tested the hypothesis that processing costs incurred by multiple-talker conditions would abate given increased stimulus structure. We tested this hypothesis by manipulating the degree to which talkers' voices differed acoustically (Experiment 1) and also the frequency with which talkers' voices changed (Experiment 2) in multiple-talker conditions. Listeners performed a speeded classification task for words containing vowels that varied in acoustic-phonemic ambiguity. In Experiment 1, response times progressively decreased as acoustic variability among talkers' voices decreased. In Experiment 2, blocking talkers within mixed-talker conditions led to more similar response times among single-talker and multiple-talker conditions. Neither result interacted with acoustic-phonemic ambiguity of the target vowels. Thus, the results showed that indexical structure mediated the processing costs incurred by hearing different talkers. This is consistent with the Efficient Coding Hypothesis, which proposes that sensory and perceptual processing are facilitated by stimulus structure. Defining the roles and limits of stimulus structure on speech perception is an important direction for future research.
言语感知受到索引变异性的挑战。大量关于说话者归一化的研究表明,与听单个说话者相比,听多个说话者会带来处理成本(例如,准确率降低、反应时间增加)。然而,当从刺激结构的角度重新审视这些研究时,很明显,过去对多说话者(即低结构)和单说话者(即高结构)条件的测试并不能代表环境中索引变化的渐变性质。在这里,我们测试了一个假设,即增加刺激结构会减轻多说话者条件下产生的处理成本。我们通过在多说话者条件下操纵说话者声音在声学上的差异程度(实验1)以及说话者声音变化的频率(实验2)来测试这个假设。听众对包含在声学音素模糊性上有所不同的元音的单词执行快速分类任务。在实验1中,随着说话者声音之间的声学变异性降低,反应时间逐渐减少。在实验2中,在混合说话者条件下对说话者进行分组,导致单说话者和多说话者条件下的反应时间更加相似。这两个结果都没有与目标元音的声学音素模糊性相互作用。因此,结果表明索引结构介导了听不同说话者所产生的处理成本。这与高效编码假说一致,该假说提出刺激结构有助于感觉和知觉处理。确定刺激结构在言语感知中的作用和局限性是未来研究的一个重要方向。