Cohen J R
J Acoust Soc Am. 1981 May;69(5):1430-8. doi: 10.1121/1.385826.
Speech is modeled as a Markov chain. Scoring is developed to convert observations of the speech signal into estimated probabilities of the locations of segment boundaries. Dynamic programming is then used to compute a most-probable segmentation for the speech. The process automatically adjusts to speakers and incorporates a priori information in a probabilistic and systemic fashion. The performance of the algorithm appears to be state-of-the-art, independent of speaker.
语音被建模为马尔可夫链。开发了评分方法,将语音信号的观测值转换为段边界位置的估计概率。然后使用动态规划来计算语音的最可能分割。该过程会自动适应说话者,并以概率和系统的方式纳入先验信息。该算法的性能似乎是最先进的,且与说话者无关。