Berisha Visar, Utianski Rene, Liss Julie
Department of Speech and Hearing Science, Arizona State University.
Proc IEEE Int Conf Acoust Speech Signal Process. 2013:2825-2828. doi: 10.1109/ICASSP.2013.6638172.
An important, yet under-explored, problem in speech processing is the automatic assessment of intelligibility for pathological speech. In practice, intelligibility assessment is often done through subjective tests administered by speech pathologists; however research has shown that these tests are inconsistent, costly, and exhibit poor reliability. Although some automatic methods for intelligibility assessment for telecommunications exist, research specific to pathological speech has been limited. Here, we propose an algorithm that captures important multi-scale perceptual cues shown to correlate well with intelligibility. Nonlinear classifiers are trained at each time scale and a final intelligibility decision is made using ensemble learning methods from machine learning. Preliminary results indicate a marked improvement in intelligibility assessment over published baseline results.
语音处理中一个重要但尚未充分探索的问题是对病理性语音可懂度的自动评估。在实践中,可懂度评估通常通过言语病理学家进行的主观测试来完成;然而,研究表明这些测试不一致、成本高且可靠性差。虽然存在一些用于电信语音可懂度评估的自动方法,但针对病理性语音的研究却很有限。在此,我们提出一种算法,该算法捕捉与可懂度密切相关的重要多尺度感知线索。在每个时间尺度上训练非线性分类器,并使用机器学习中的集成学习方法做出最终的可懂度决策。初步结果表明,与已发表的基线结果相比,可懂度评估有显著改善。