Centre for Language and Speech Technology, Radboud University Nijmegen, Erasmusplein 1, 6525 HT Nijmegen, The Netherlands.
J Acoust Soc Am. 2010 Feb;127(2):1084-95. doi: 10.1121/1.3277194.
Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve similar performance in terms of percentage correct boundary detection. Nevertheless, unsupervised segmentation algorithms are not able to perfectly reproduce manually obtained reference transcriptions. This paper investigates fundamental problems for unsupervised segmentation algorithms by comparing a phone segmentation obtained using only the acoustic information present in the signal with a reference segmentation created by human transcribers. The analyses of the output of an unsupervised speech segmentation method that uses acoustic change to hypothesize boundaries showed that acoustic change is a fairly good indicator of segment boundaries: over two-thirds of the hypothesized boundaries coincide with segment boundaries. Statistical analyses showed that the errors are related to segment duration, sequences of similar segments, and inherently dynamic phones. In order to improve unsupervised automatic speech segmentation, current one-stage bottom-up segmentation methods should be expanded into two-stage segmentation methods that are able to use a mix of bottom-up information extracted from the speech signal and automatically derived top-down information. In this way, unsupervised methods can be improved while remaining flexible and language-independent.
尽管使用了不同的算法,但大多数无监督自动电话分割方法在边界检测的正确百分比方面都能达到相似的性能。然而,无监督的分割算法并不能完美地重现手动获得的参考转录。本文通过比较仅使用信号中存在的声学信息获得的电话分割与由人工转录员创建的参考分割,研究了无监督分割算法的基本问题。对使用声学变化假设边界的无监督语音分割方法的输出进行的分析表明,声学变化是边界的一个相当好的指示符:超过三分之二的假设边界与分段边界重合。统计分析表明,错误与分段持续时间、相似分段的序列以及固有动态电话有关。为了改进无监督自动语音分割,目前的单阶段自下而上的分割方法应该扩展为能够使用从语音信号中提取的自下而上的信息和自动推导的自上而下的信息的两阶段分割方法。通过这种方式,可以在保持灵活性和语言独立性的同时改进无监督方法。