Kortekaas R W, Hermes D J, Meyer G F
Institute for Perception Research/IPO, Eindhoven, The Netherlands.
J Acoust Soc Am. 1996 Feb;99(2):1185-99. doi: 10.1121/1.414671.
An algorithm for detection of vowel onsets in fluent speech was presented by Hermes [j. Acoust. Soc. Am. 87, 866-873 (1990)]. Performance tests showed that detection was good for fluent speech, although the parameter settings had to be modified for application to well-articulated speech. One of the purposes of the algorithm was application to speech by deaf persons, for which it failed completely. In order to improve the algorithm and to make it more generally applicable, two alternative detection strategies have been explored in the present study. These strategies were (a) simulation of transient-chopper responses in the cochlear nucleus and (B) training of multilayer perceptrons. Two large databases of read speech have been used for performance comparison of the original algorithm and the new strategies. The strategy based on simulating cochlear-nucleus responses is found both to result in a higher false-alarm rate than the original algorithm and to be rather level dependent. On the other hand, the performance of a multilayer-perceptron network, trained on mel-scaled spectra, is comparable to the performance of the Hermes algorithm. In more general terms, the results suggest that temporal information on intensity and (rough) spectral envelope are important for human vowel-onset detection behavior. In formation on harmonicity can be used as a secondary source of information to avoid detection of mainly unvoiced, nonvowel onsets.