借助声束形成技术判断听觉场景中说话者的数量和性别

Judging the Number and Gender of Talkers Present in an Auditory Scene Aided by Acoustic Beamforming.

作者信息

Byrne Andrew J, Kidd Gerald

机构信息

Department of Speech, Language, & Hearing Sciences and Hearing Research Center, Boston University, Boston, MA, USA.

Department of Otolaryngology, Head-Neck Surgery, Medical University of South Carolina, Charleston, SC, USA.

出版信息

Trends Hear. 2025 Jan-Dec;29:23312165251329791. doi: 10.1177/23312165251329791. Epub 2025 May 29.

DOI:10.1177/23312165251329791

PMID:40438002

Abstract

The perceived numerosity of simultaneous, spatially separated speech sources was used to evaluate the effectiveness of triple beamformer processing, compared to that of both a single-channel beamformer and natural listening. Participants made judgments of the total number of talkers present in a simulated sound field and the gender composition of the talker group. The perceived numerosity was always underestimated for groups of more than three talkers. Performance with the triple beamformer was roughly equivalent to that of natural listening, including a beneficial effect of spatial separation of the sources in azimuth. The gender mix of the talker group also affected the numerosity judgments although the perceived gender ratio was generally accurate even when the total group count was underestimated. Time-reversing the speech resulted in lower numerosity judgements (increased error) under both natural and triple beamformer listening, suggesting an influence of linguistic processing on source numerosity judgments. Overall, factors that enhanced source segregation and speech stream coherence decreased errors in numerosity judgments. A stimulus-derived metric-the composite of glimpsed energy retained for all talkers in the sound field-was found to be a reasonably accurate predictor of the subjective numerosity judgments.

摘要

与单通道波束形成器和自然聆听相比，利用同时出现的、空间上分离的语音源的感知数量来评估三波束形成器处理的有效性。参与者对模拟声场中说话者的总数以及说话者群体的性别构成进行判断。对于超过三个说话者的群体，感知数量总是被低估。三波束形成器的表现大致等同于自然聆听，包括声源在方位上空间分离的有益效果。说话者群体的性别组合也会影响数量判断，尽管即使在群体总数被低估时，感知到的性别比例通常也是准确的。在自然聆听和三波束形成器聆听两种情况下，对语音进行时间反转都会导致数量判断降低（误差增加），这表明语言处理对声源数量判断有影响。总体而言，增强声源分离和语音流连贯性的因素会减少数量判断中的误差。发现一种基于刺激的指标——声场中所有说话者保留的瞥见能量的综合指标——是主观数量判断的合理准确预测指标。