Ananthapadmanabha T V, Prathosh A P, Ramakrishnan A G
Voice and Speech Systems, Temple Road, Malleshwaram, Bangalore 560003, India.
Department of Electrical Engineering, Indian Institute of Science, Bangalore 560012, India.
J Acoust Soc Am. 2014 Jan;135(1):460-71. doi: 10.1121/1.4836055.
Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages.
自动且准确地检测塞音和塞擦音的闭塞-爆发过渡事件在语音处理中有许多应用。提出了一种名为爆破指数的时间度量来检测此类事件,这些事件的特征是能量突然增加。使用基音同步归一化互相关的最大值作为额外的时间特征,设计了一种基于规则的算法,其目的是仅选择那些与塞音和塞擦音的闭塞-爆发过渡相关的事件。使用整个TIMIT测试和训练数据库中标记的塞音和塞擦音的闭塞-爆发过渡,通过接收者操作特征曲线和时间准确性来评估该算法的性能。使用TIMIT测试集以及使用NTIMIT测试集的电话质量语音,研究了该算法在全局白噪声和嘈杂噪声以及局部噪声方面的鲁棒性。对于这些实验,所提出的算法不需要显式的统计训练,并且基于两种一维时间度量,其性能与现有方法相当或更好。此外,为了测试可扩展性,该算法应用于Buckeye对话语音语料库和两种印度语言的数据库。