Suppr超能文献

肌萎缩侧索硬化症患者的语音转换。

Voice Conversion for Persons with Amyotrophic Lateral Sclerosis.

出版信息

IEEE J Biomed Health Inform. 2020 Oct;24(10):2942-2949. doi: 10.1109/JBHI.2019.2961844. Epub 2019 Dec 25.

Abstract

Amyotrophic lateral sclerosis (ALS) results in progressive paralysis of voluntary muscles throughout the body. As speech deteriorates, individuals rely on pre-programmed messages available on commercial speech generating devices to communicate using one of the generic electronic voices on the device. To replace these generic voices and restore vocal identity, our aim is to develop personalized voices for people with ALS via the approach of voice conversion. The task is challenging because very few people have large quantities of their premorbid healthy speech recorded. Therefore, we have to rely on small quantities of dysarthric speech concomitant with an individual's disease stage. Further, progressive fatigue prohibits acquisition of large speech datasets and individuals display a range of dysarthria severities resulting from breathing, voice, articulation, resonance, and prosody disturbances. As the first step to address these problems, we use healthy source speakers and propose the approach of combining a structured sparse spectral transform with multiple linear regression-based frequency warping prediction for spectral conversion, and interpolating the transformed spectral frames for speech rate modification. Our experimental data included four healthy source speakers from the ARCTIC dataset, and four target ALS speakers with mild to severe dysarthria, forming 16 speaker pairs. Subjective listening evaluations showed that on average, (i) the proposed approach improved speech intelligibility by about 80% over the target speakers' speech, (ii) the converted voice was 3 times more similar to the target speakers' speech than to the source speakers' speech, and (iii) the converted speech quality was close to the MOS scale "good" relative to the source speakers' speech being "excellent."

摘要

肌萎缩侧索硬化症(ALS)会导致全身随意肌逐渐瘫痪。随着言语功能的恶化,患者依赖商业语音生成设备上预先编程的信息,使用设备上的通用电子声音之一进行交流。为了替代这些通用声音并恢复声音特征,我们的目标是通过语音转换技术为 ALS 患者开发个性化声音。这项任务极具挑战性,因为只有极少数人有大量的预患病健康语音记录。因此,我们必须依赖与个体疾病阶段同时存在的少量构音障碍语音。此外,进行性疲劳会阻碍大语音数据集的获取,并且个体表现出一系列因呼吸、语音、发音、共鸣和韵律障碍导致的构音障碍严重程度。作为解决这些问题的第一步,我们使用健康的源说话人,并提出了一种结合结构稀疏谱变换和基于多元线性回归的频率扭曲预测的方法来进行谱转换,并对变换后的谱帧进行内插以实现语速修改。我们的实验数据包括来自 ARCTIC 数据集的四位健康源说话人,以及四位患有轻度至重度构音障碍的目标 ALS 说话人,共形成 16 对说话人。主观听力评估表明,平均而言,(i)与目标说话人的语音相比,该方法提高了语音可懂度约 80%,(ii)转换后的语音与目标说话人的语音的相似性是与源说话人的语音的相似性的 3 倍,(iii)转换后的语音质量与源说话人的语音的“极好”相比接近 MOS 等级“良好”。

相似文献

1
Voice Conversion for Persons with Amyotrophic Lateral Sclerosis.
IEEE J Biomed Health Inform. 2020 Oct;24(10):2942-2949. doi: 10.1109/JBHI.2019.2961844. Epub 2019 Dec 25.
3
Temporal acoustic measures of dysarthria associated with amyotrophic lateral sclerosis.
J Speech Hear Res. 1987 Mar;30(1):80-7. doi: 10.1044/jshr.3001.80.
5
Predicting Intelligibility Gains in Individuals With Dysarthria From Baseline Speech Features.
J Speech Lang Hear Res. 2017 Nov 9;60(11):3043-3057. doi: 10.1044/2016_JSLHR-S-16-0218.
6
Towards personalized speech synthesis for augmentative and alternative communication.
Augment Altern Commun. 2014 Sep;30(3):226-36. doi: 10.3109/07434618.2014.924026. Epub 2014 Jul 15.
7
Articulatory Range of Movement in Individuals With Dysarthria Secondary to Amyotrophic Lateral Sclerosis.
Am J Speech Lang Pathol. 2018 Aug 6;27(3):996-1009. doi: 10.1044/2018_AJSLP-17-0064.
8
Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion.
IEEE/ACM Trans Audio Speech Lang Process. 2018 Dec;26(12):2267-2276. doi: 10.1109/TASLP.2018.2860682. Epub 2018 Jul 27.
9
Dysarthria in amyotrophic lateral sclerosis: A review.
Amyotroph Lateral Scler. 2010;11(1-2):4-15. doi: 10.3109/17482960802379004.
10
A joint-feature learning-based voice conversion system for dysarthric user based on deep learning technology.
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:1838-1841. doi: 10.1109/EMBC.2019.8856560.

引用本文的文献

1
Management of Dysarthria in Amyotrophic Lateral Sclerosis.
Cells. 2025 Jul 9;14(14):1048. doi: 10.3390/cells14141048.
2
Computer-aided cholelithiasis diagnosis using explainable convolutional neural network.
Sci Rep. 2025 Feb 4;15(1):4249. doi: 10.1038/s41598-025-85798-2.

本文引用的文献

1
Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion.
IEEE/ACM Trans Audio Speech Lang Process. 2018 Dec;26(12):2267-2276. doi: 10.1109/TASLP.2018.2860682. Epub 2018 Jul 27.
2
Profiling Speech and Pausing in Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal Dementia (FTD).
PLoS One. 2016 Jan 20;11(1):e0147573. doi: 10.1371/journal.pone.0147573. eCollection 2016.
3
Speaking and Hearing Clearly: Talker and Listener Factors in Speaking Style Changes.
Lang Linguist Compass. 2009 Jan 1;3(1):236-264. doi: 10.1111/j.1749-818X.2008.00112.x.
4
Nontraumatic spinal cord injury: incidence, epidemiology, and functional outcome.
Arch Phys Med Rehabil. 1999 Jun;80(6):619-23. doi: 10.1016/s0003-9993(99)90162-4.
5
Reliability and agreement of ratings of ataxic dysarthric speech samples with varying intelligibility.
J Speech Hear Res. 1991 Apr;34(2):285-93. doi: 10.1044/jshr.3402.285.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验