Department of Neurological Surgery, University of California, San Francisco, California 94122, USA.
J Acoust Soc Am. 2011 Jul;130(1):EL14-8. doi: 10.1121/1.3595744.
A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion architecture is chosen automatically to maximize the similarity of output statistics to clean condition. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.
提出了一种基于从语音的不同时频调制形成流的多流音素识别框架。从每个流分别估计音素后验概率,并在输出级别进行组合。最终估计后验概率的统计模型用于描述系统性能。在操作过程中,自动选择最佳融合架构,以最大化输出统计与清洁条件的相似性。来自噪声语音的音素识别结果表明了所提出方法的有效性。