Regondi Stefano, Donvito Giordana, Frontoni Emanuele, Kostovic Milutin, Minazzi Fabio, Bratières Sébastien, Filosto Massimiliano, Pugliese Raffaele
NeMO Lab, ASST GOM Niguarda Cà Granda Hospital, Milan, Italy.
NEuroMuscular Omnicenter (NEMO), Fondazione Serena Onlus, Milan, Italy.
Sci Rep. 2025 Jan 8;15(1):1361. doi: 10.1038/s41598-024-84728-y.
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that can result in a progressive loss of speech due to bulbar dysfunction, which can have significant negative impact on the patient's mental well-being. Alternative Augmentative Communication (AAC) strategies based on synthetic voices have been shown to assist patients in maintaining communication and improving their Quality of Life (QoL). However, such synthetic voices are often perceived as impersonal and fail to capture the unique voice and identity of the patient. To tackle this issue, combining voice banking (VB) and artificial intelligence (AI) has emerged as a more natural communication strategy, enabling individuals to preserve their voice for use with AAC devices as needed. This involves recording speech samples to generate a synthetic voice closely resembling the individual's own. Despite the increasing interest in VB, there's a lack of clear strategies for its effective implementation in rapidly progressing diseases like ALS. Additionally, the perceptual quality of VB on patients with preserved speech, especially when offered early in the disease, remains poorly understood. In light of these challenges, this study aims to assess the effectiveness and the perceptual impact of AI-generated voices on ALS patients with preserved speech, utilizing a personalized voice synthesis system based on machine learning. The AI-generated patient-specific voice is achieved through voice recording, followed by fine-tuning using a Generative Adversarial Network for Efficient and High Fidelity Speech Synthesis (HiFi-GAN), resulting in a model capable of producing speech highly similar to the patient's own voice, with exceptional expressive and audio quality. By addressing these aspects, this study intends to offer valuable insights into the potential benefits and challenges of combining VB with AI voices to enhance communication support for ALS patients.
肌萎缩侧索硬化症(ALS)是一种神经退行性疾病,由于延髓功能障碍可导致渐进性言语丧失,这会对患者的心理健康产生重大负面影响。基于合成语音的替代性辅助沟通(AAC)策略已被证明有助于患者保持沟通并提高生活质量(QoL)。然而,这种合成语音通常被认为缺乏人情味,无法捕捉患者独特的声音和个性。为了解决这个问题,将语音库(VB)与人工智能(AI)相结合已成为一种更自然的沟通策略,使个人能够根据需要保存自己的声音以供AAC设备使用。这包括录制语音样本以生成与个人自己的语音非常相似的合成语音。尽管对VB的兴趣日益增加,但在ALS等快速进展的疾病中,缺乏明确的有效实施策略。此外,对于仍有言语能力的患者,尤其是在疾病早期提供VB时,其感知质量仍知之甚少。鉴于这些挑战,本研究旨在利用基于机器学习的个性化语音合成系统,评估人工智能生成的语音对仍有言语能力的ALS患者的有效性和感知影响。通过语音录制,然后使用用于高效和高保真语音合成的生成对抗网络(HiFi-GAN)进行微调,从而生成特定于患者的人工智能语音,得到一个能够产生与患者自己的语音高度相似、具有出色表现力和音频质量的模型。通过解决这些方面的问题,本研究旨在为将VB与人工智能语音相结合以增强对ALS患者的沟通支持的潜在益处和挑战提供有价值的见解。