目标均衡抵消法与源分离定位法的比较。

Comparison of a target-equalization-cancellation approach and a localization approach to source separation.

作者信息

Mi Jing, Groll Matti, Colburn H Steven

机构信息

Hearing Research Center, Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, USA.

出版信息

J Acoust Soc Am. 2017 Nov;142(5):2933. doi: 10.1121/1.5009763.

DOI:10.1121/1.5009763

PMID:29195469

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5685812/

Abstract

Interaural differences are important for listeners to be able to maintain focus on a sound source of interest in the presence of multiple sources. Because interaural differences are sound localization cues, most binaural-cue-based source separation algorithms attempt separation by localizing each time-frequency (T-F) unit to one of the possible source directions using interaural differences. By assembling T-F units that are assigned to one direction, the sound stream from that direction is enhanced. In this paper, a different type of binaural cue for source-separation purposes is proposed. For each T-F unit, the target-direction signal is cancelled by applying the equalization-cancellation (EC) operation to cancel the signal from the target direction; then, the dominance of the target in each T-F unit is determined by the effectiveness of the cancellation. Specifically, the energy change from cancellation is used as the criterion for target dominance for each T-F unit. Source-separation performance using the target-EC cue is compared with performance using localization cues. With simulated multi-talker and diffuse-babble interferers, the algorithm based on target-EC cues yields better source-separation performance than the algorithm based on localization cues, both in direct comparison with the ideal binary mask and in measured speech intelligibility for the separated target streams.

摘要

双耳差异对于听众在存在多个声源的情况下能够专注于感兴趣的声源非常重要。由于双耳差异是声音定位线索，大多数基于双耳线索的声源分离算法试图通过使用双耳差异将每个时频（T-F）单元定位到可能的声源方向之一来进行分离。通过组装分配到一个方向的T-F单元，来自该方向的声流得到增强。在本文中，提出了一种用于声源分离目的的不同类型的双耳线索。对于每个T-F单元，通过应用均衡抵消（EC）操作来抵消来自目标方向的信号，从而消除目标方向信号；然后，每个T-F单元中目标的主导性由抵消的有效性来确定。具体而言，抵消引起的能量变化被用作每个T-F单元中目标主导性的标准。将使用目标EC线索的声源分离性能与使用定位线索的性能进行比较。在模拟多说话者和漫反射干扰源的情况下，基于目标EC线索的算法在与理想二元掩码直接比较以及在分离目标流的测量语音可懂度方面，都比基于定位线索的算法产生更好的声源分离性能。