Center for Biomedical Technology, Universidad Politécnica de Madrid , Madrid , Spain.
Front Bioeng Biotechnol. 2016 Jan 20;4:1. doi: 10.3389/fbioe.2016.00001. eCollection 2016.
There exist many acoustic parameters employed for pathological assessment tasks, which have served as tools for clinicians to distinguish between normophonic and pathological voices. However, many of these parameters require an appropriate tuning in order to maximize its efficiency. In this work, a group of new and already proposed modulation spectrum (MS) metrics are optimized considering different time and frequency ranges pursuing the maximization of efficiency for the detection of pathological voices. The optimization of the metrics is performed simultaneously in two different voice databases in order to identify what tuning ranges produce a better generalization. The experiments were cross-validated so as to ensure the validity of the results. A third database is used to test the optimized metrics. In spite of some differences, results indicate that the behavior of the metrics in the optimization process follows similar tendencies for the tuning databases, confirming the generalization capabilities of the proposed MS metrics. In addition, the tuning process reveals which bands of the modulation spectra have relevant information for each metric, which has a physical interpretation respecting the phonatory system. Efficiency values up to 90.6% are obtained in one tuning database, while in the other, the maximum efficiency reaches 71.1%. Obtained results also evidence a separability between normophonic and pathological states using the proposed metrics, which can be exploited for voice pathology detection or assessment.
存在许多用于病理评估任务的声学参数,这些参数已被临床医生用作区分正常音和病理音的工具。然而,许多参数都需要进行适当的调整,以最大限度地提高其效率。在这项工作中,考虑到不同的时间和频率范围,我们对一组新的和已提出的调制谱 (MS) 指标进行了优化,以追求检测病理音的效率最大化。为了识别出产生更好泛化能力的调整范围,在两个不同的语音数据库中同时进行了指标的优化。实验采用交叉验证,以确保结果的有效性。第三个数据库用于测试优化后的指标。尽管存在一些差异,但结果表明,指标在优化过程中的行为遵循与调整数据库相似的趋势,从而证实了所提出的 MS 指标的泛化能力。此外,调整过程揭示了调制谱的哪些频段对每个指标具有相关信息,这与发音系统的物理解释相符。在一个调整数据库中,效率值高达 90.6%,而在另一个数据库中,最大效率达到 71.1%。使用所提出的指标还可以证明正常音和病理音状态之间的可分离性,这可用于语音病理检测或评估。