Suppr超能文献

基于有效声学模型的波束成形训练,用于静态和动态 HRi 应用。

Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications.

机构信息

Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile.

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

Sensors (Basel). 2024 Oct 15;24(20):6644. doi: 10.3390/s24206644.

Abstract

Human-robot collaboration will play an important role in the fourth industrial revolution in applications related to hostile environments, mining, industry, forestry, education, natural disaster and defense. Effective collaboration requires robots to understand human intentions and tasks, which involves advanced user profiling. Voice-based communication, rich in complex information, is key to this. Beamforming, a technology that enhances speech signals, can help robots extract semantic, emotional, or health-related information from speech. This paper describes the implementation of a system that provides substantially improved signal-to-noise ratio (SNR) and speech recognition accuracy to a moving robotic platform for use in human-robot interaction (HRI) applications in static and dynamic contexts. This study focuses on training deep learning-based beamformers using acoustic model-based multi-style training with measured room impulse responses (RIRs). The results show that this approach outperforms training with simulated RIRs or matched measured RIRs, especially in dynamic conditions involving robot motion. The findings suggest that training with a broad range of measured RIRs is sufficient for effective HRI in various environments, making additional data recording or augmentation unnecessary. This research demonstrates that deep learning-based beamforming can significantly improve HRI performance, particularly in challenging acoustic environments, surpassing traditional beamforming methods.

摘要

人机协作将在与敌对环境、采矿、工业、林业、教育、自然灾害和国防相关的第四次工业革命的应用中发挥重要作用。有效的协作需要机器人理解人类的意图和任务,这涉及到先进的用户档案。基于语音的通信,富含复杂的信息,是这方面的关键。波束形成技术可以增强语音信号,有助于机器人从语音中提取语义、情感或与健康相关的信息。本文描述了一种系统的实现,该系统为移动机器人平台提供了大大提高的信噪比(SNR)和语音识别准确性,用于在静态和动态环境中的人机交互(HRI)应用。本研究侧重于使用基于声学模型的多风格训练和测量的房间脉冲响应(RIR)来训练基于深度学习的波束形成器。结果表明,与使用模拟 RIR 或匹配的测量 RIR 进行训练相比,该方法在涉及机器人运动的动态条件下表现更好。研究结果表明,使用广泛的测量 RIR 进行训练足以在各种环境中进行有效的 HRI,无需额外的数据记录或增强。这项研究表明,基于深度学习的波束形成可以显著提高 HRI 的性能,特别是在具有挑战性的声学环境中,超过了传统的波束形成方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/1d3e7bcd10eb/sensors-24-06644-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验