Suppr超能文献

基于联合字典学习的非负矩阵分解用于口腔手术后语音转换以提高语音清晰度

Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.

作者信息

Fu Szu-Wei, Li Pei-Chun, Lai Ying-Hui, Yang Cheng-Chien, Hsieh Li-Chun, Tsao Yu

机构信息

Department of Computer Science and Information EngineeringNational Taiwan University.

Department of Audiology and Speech Language PathologyMackay Medical College.

出版信息

IEEE Trans Biomed Eng. 2017 Nov;64(11):2584-2594. doi: 10.1109/TBME.2016.2644258.

Abstract

This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.

摘要

本文聚焦于基于机器学习的语音转换(VC)技术,旨在提高已切除部分发音器官的外科手术患者的语音清晰度。由于发音器官部分被切除,患者的语音可能会失真且难以理解。为克服这一问题,可应用VC方法来转换失真的语音,使其清晰且更易理解。要设计一种有效的VC方法,必须考虑两个关键点:1)训练数据量可能有限(因为术后患者长时间说话通常很困难);2)需要快速转换(以便更好地交流)。我们提出了一种基于联合字典学习的新型非负矩阵分解(JD-NMF)算法。与传统的VC技术相比,JD-NMF仅用少量训练数据就能高效且有效地执行VC。实验结果表明,所提出的JD-NMF方法不仅在标准化客观清晰度评估指标——短时客观清晰度(STOI)得分上显著高于未转换的原始语音,而且比传统的基于样本的NMF VC方法效率更高、效果更好。在所期望的场景下,所提出的JD-NMF方法在STOI得分方面可能优于当前最先进的基于样本的NMF VC方法。我们证实了所提出的基于NMF的VC联合训练准则的优势。此外,我们验证了所提出的JD-NMF能够有效提高口腔外科手术患者的语音清晰度得分。本文聚焦于基于机器学习的语音转换(VC)技术,旨在提高已切除部分发音器官的外科手术患者的语音清晰度。由于发音器官部分被切除,患者的语音可能会失真且难以理解。为克服这一问题,可应用VC方法来转换失真的语音,使其清晰且更易理解。要设计一种有效的VC方法,必须考虑两个关键点:1)训练数据量可能有限(因为术后患者长时间说话通常很困难);2)需要快速转换(以便更好地交流)。我们提出了一种基于联合字典学习的新型非负矩阵分解(JD-NMF)算法。与传统的VC技术相比,JD-NMF仅用少量训练数据就能高效且有效地执行VC。实验结果表明,所提出的JD-NMF方法不仅在标准化客观清晰度评估指标——短时客观清晰度(STOI)得分上显著高于未转换的原始语音,而且比传统的基于样本的NMF VC方法效率更高、效果更好。在所期望的场景下,所提出的JD-NMF方法在STOI得分方面可能优于当前最先进的基于样本的NMF VC方法。我们证实了所提出的基于NMF的VC联合训练准则的优势。此外,我们验证了所提出的JD-NMF能够有效提高口腔外科手术患者的语音清晰度得分。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验