• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于有效声学模型的波束成形训练,用于静态和动态 HRi 应用。

Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications.

机构信息

Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile.

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

Sensors (Basel). 2024 Oct 15;24(20):6644. doi: 10.3390/s24206644.

DOI:10.3390/s24206644
PMID:39460124
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11511554/
Abstract

Human-robot collaboration will play an important role in the fourth industrial revolution in applications related to hostile environments, mining, industry, forestry, education, natural disaster and defense. Effective collaboration requires robots to understand human intentions and tasks, which involves advanced user profiling. Voice-based communication, rich in complex information, is key to this. Beamforming, a technology that enhances speech signals, can help robots extract semantic, emotional, or health-related information from speech. This paper describes the implementation of a system that provides substantially improved signal-to-noise ratio (SNR) and speech recognition accuracy to a moving robotic platform for use in human-robot interaction (HRI) applications in static and dynamic contexts. This study focuses on training deep learning-based beamformers using acoustic model-based multi-style training with measured room impulse responses (RIRs). The results show that this approach outperforms training with simulated RIRs or matched measured RIRs, especially in dynamic conditions involving robot motion. The findings suggest that training with a broad range of measured RIRs is sufficient for effective HRI in various environments, making additional data recording or augmentation unnecessary. This research demonstrates that deep learning-based beamforming can significantly improve HRI performance, particularly in challenging acoustic environments, surpassing traditional beamforming methods.

摘要

人机协作将在与敌对环境、采矿、工业、林业、教育、自然灾害和国防相关的第四次工业革命的应用中发挥重要作用。有效的协作需要机器人理解人类的意图和任务,这涉及到先进的用户档案。基于语音的通信,富含复杂的信息,是这方面的关键。波束形成技术可以增强语音信号,有助于机器人从语音中提取语义、情感或与健康相关的信息。本文描述了一种系统的实现,该系统为移动机器人平台提供了大大提高的信噪比(SNR)和语音识别准确性,用于在静态和动态环境中的人机交互(HRI)应用。本研究侧重于使用基于声学模型的多风格训练和测量的房间脉冲响应(RIR)来训练基于深度学习的波束形成器。结果表明,与使用模拟 RIR 或匹配的测量 RIR 进行训练相比,该方法在涉及机器人运动的动态条件下表现更好。研究结果表明,使用广泛的测量 RIR 进行训练足以在各种环境中进行有效的 HRI,无需额外的数据记录或增强。这项研究表明,基于深度学习的波束形成可以显著提高 HRI 的性能,特别是在具有挑战性的声学环境中,超过了传统的波束形成方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/7edfe759105c/sensors-24-06644-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/1d3e7bcd10eb/sensors-24-06644-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/70735645a6ad/sensors-24-06644-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/ffb1922a309b/sensors-24-06644-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/a9940028b01e/sensors-24-06644-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/07064c23c044/sensors-24-06644-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/5f65fa99bd22/sensors-24-06644-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/9b1964ba6dfa/sensors-24-06644-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/0f03ae320894/sensors-24-06644-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/66b146c02c33/sensors-24-06644-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/7edfe759105c/sensors-24-06644-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/1d3e7bcd10eb/sensors-24-06644-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/70735645a6ad/sensors-24-06644-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/ffb1922a309b/sensors-24-06644-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/a9940028b01e/sensors-24-06644-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/07064c23c044/sensors-24-06644-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/5f65fa99bd22/sensors-24-06644-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/9b1964ba6dfa/sensors-24-06644-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/0f03ae320894/sensors-24-06644-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/66b146c02c33/sensors-24-06644-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e020/11511554/7edfe759105c/sensors-24-06644-g010.jpg

相似文献

1
Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications.基于有效声学模型的波束成形训练,用于静态和动态 HRi 应用。
Sensors (Basel). 2024 Oct 15;24(20):6644. doi: 10.3390/s24206644.
2
Automatic Detection of Dyspnea in Real Human-Robot Interaction Scenarios.真实人机交互场景下呼吸困难的自动检测。
Sensors (Basel). 2023 Sep 1;23(17):7590. doi: 10.3390/s23177590.
3
A unified beamforming and source separation model for static and dynamic human-robot interaction.用于静态和动态人机交互的统一波束形成和源分离模型。
JASA Express Lett. 2024 Mar 1;4(3). doi: 10.1121/10.0025238.
4
Robot-Assisted Pedestrian Regulation Based on Deep Reinforcement Learning.基于深度强化学习的机器人辅助行人监管。
IEEE Trans Cybern. 2020 Apr;50(4):1669-1682. doi: 10.1109/TCYB.2018.2878977. Epub 2018 Nov 20.
5
Improving gesture-based interaction between an assistive bathing robot and older adults via user training on the gestural commands.通过对老年人进行手势命令的用户培训,改善辅助沐浴机器人与老年人之间基于手势的交互。
Arch Gerontol Geriatr. 2020 Mar-Apr;87:103996. doi: 10.1016/j.archger.2019.103996. Epub 2019 Dec 13.
6
Beamforming for directional sources: additional estimator and evaluation of performance under different acoustic scenarios.波束形成用于指向性声源:不同声场景下的附加估计器和性能评估。
J Acoust Soc Am. 2011 Apr;129(4):2042-51. doi: 10.1121/1.3557055.
7
Recent advancements in multimodal human-robot interaction.多模态人机交互的最新进展。
Front Neurorobot. 2023 May 11;17:1084000. doi: 10.3389/fnbot.2023.1084000. eCollection 2023.
8
Human-Robot Interaction and Social Robot: The Emerging Field of Healthcare Robotics and Current and Future Perspectives for Spinal Care.人机交互与社交机器人:医疗机器人的新兴领域以及脊柱护理的现状与未来展望。
Neurospine. 2024 Sep;21(3):868-877. doi: 10.14245/ns.2448432.216. Epub 2024 Sep 30.
9
Robust acoustic source localization based on modal beamforming and time-frequency processing using circular microphone arrays.基于模态波束形成和时频处理的圆形麦克风阵列的鲁棒声源定位。
J Acoust Soc Am. 2012 Sep;132(3):1511-20. doi: 10.1121/1.4740503.
10
An Experimental Safety Response Mechanism for an Autonomous Moving Robot in a Smart Manufacturing Environment Using Q-Learning Algorithm and Speech Recognition.基于 Q-learning 算法和语音识别的智能制造环境下自主移动机器人实验安全响应机制
Sensors (Basel). 2022 Jan 26;22(3):941. doi: 10.3390/s22030941.

本文引用的文献

1
Moving sound source localization and tracking for an autonomous robot equipped with a self-rotating bi-microphone array.配备自旋转双麦克风阵列的自主机器人的移动声源定位与跟踪
J Acoust Soc Am. 2023 Aug 1;154(2):1261-1273. doi: 10.1121/10.0020583.
2
Three-stage hybrid neural beamformer for multi-channel speech enhancement.三阶段混合神经波束形成器用于多通道语音增强。
J Acoust Soc Am. 2023 Jun 1;153(6):3378. doi: 10.1121/10.0019802.
3
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.卷积时域音频分离网络(Conv-TasNet):超越理想时频幅度掩蔽的语音分离方法
IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.
4
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
5
Voice - How humans communicate?语音——人类如何交流?
J Nat Sci Biol Med. 2012 Jan;3(1):3-11. doi: 10.4103/0976-9668.95933.
6
Prediction of intent in robotics and multi-agent systems.机器人技术与多智能体系统中的意图预测。
Cogn Process. 2007 Sep;8(3):151-8. doi: 10.1007/s10339-007-0168-9. Epub 2007 May 4.