• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习方法检测 Lombard 语音。

Detecting Lombard Speech Using Deep Learning Approach.

机构信息

PGS Software, 50-086 Wrocław, Poland.

Institute of Data Science and Digital Technologies, Vilnius University, LT-08412 Vilnius, Lithuania.

出版信息

Sensors (Basel). 2022 Dec 28;23(1):315. doi: 10.3390/s23010315.

DOI:10.3390/s23010315
PMID:36616913
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9824848/
Abstract

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.

摘要

鲁棒的 Lombard 语音在噪声中的检测具有挑战性。本研究提出了一种使用机器学习方法检测 Lombard 语音的策略,适用于公共广播系统等实时应用。本文首先介绍了 Lombard 效应的背景。然后,概述了用于 Lombard 语音检测的工作假设。所提出的框架结合了卷积神经网络 (CNN) 和各种二维 (2D) 语音信号表示。为了降低计算成本,同时不放弃基于 2D 表示的方法,引入了基于阈值的 Lombard 效应检测结果平均策略。还包括平均过程的伪代码。进行了一系列实验来确定最有效的网络结构和 2D 语音信号表示。对包含 Lombard 语音的德语和波兰语录音进行了调查。所有 2D 信号语音表示都进行了带和不带扩充的测试。扩充意味着使用 alpha 通道存储附加数据:说话者的性别、F0 频率和前两个 MFCC。实验结果表明,Lombard 语音和中性语音录音可以清晰地区分,并且具有很高的检测准确性。还证明了所提出的语音检测过程能够实时工作。这些是这项工作的主要贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f2959bfa67c1/sensors-23-00315-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/b449578816c0/sensors-23-00315-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f50e3bc153f7/sensors-23-00315-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/6c69e13c7042/sensors-23-00315-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/51ccd75b825f/sensors-23-00315-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/bffdd6dce0cf/sensors-23-00315-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/e0ab6cd0eba1/sensors-23-00315-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/5306267698f2/sensors-23-00315-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f0315a1b1a45/sensors-23-00315-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/6550552f43c2/sensors-23-00315-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/7783e940151a/sensors-23-00315-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f2959bfa67c1/sensors-23-00315-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/b449578816c0/sensors-23-00315-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f50e3bc153f7/sensors-23-00315-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/6c69e13c7042/sensors-23-00315-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/51ccd75b825f/sensors-23-00315-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/bffdd6dce0cf/sensors-23-00315-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/e0ab6cd0eba1/sensors-23-00315-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/5306267698f2/sensors-23-00315-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f0315a1b1a45/sensors-23-00315-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/6550552f43c2/sensors-23-00315-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/7783e940151a/sensors-23-00315-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fff/9824848/f2959bfa67c1/sensors-23-00315-g011.jpg

相似文献

1
Detecting Lombard Speech Using Deep Learning Approach.使用深度学习方法检测 Lombard 语音。
Sensors (Basel). 2022 Dec 28;23(1):315. doi: 10.3390/s23010315.
2
The Lombard reflex and its role on human listeners and automatic speech recognizers.伦巴德反射及其在人类听众和自动语音识别器上的作用。
J Acoust Soc Am. 1993 Jan;93(1):510-24. doi: 10.1121/1.405631.
3
The intelligibility of Lombard speech for non-native listeners.非本地听众对伦巴第语的可理解度。
J Acoust Soc Am. 2012 Aug;132(2):1120-9. doi: 10.1121/1.4732062.
4
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.用于说话人识别的 Lombard 效应和低语的分析与校准
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:927-942. doi: 10.1109/taslp.2021.3053388. Epub 2021 Jan 21.
5
Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data.通过对训练数据进行噪声增强来提高语音命令识别的抗噪声能力。
Sensors (Basel). 2020 Apr 19;20(8):2326. doi: 10.3390/s20082326.
6
Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language.基于深度学习方法的乌兹别克语自动语音识别方法。
Sensors (Basel). 2022 May 12;22(10):3683. doi: 10.3390/s22103683.
7
Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech.增强的幅度调制有助于伦巴德可懂度增益:来自尼美根伦巴德语音语料库的证据。
J Acoust Soc Am. 2020 Feb;147(2):721. doi: 10.1121/10.0000646.
8
The Lombard effect observed in speech produced by cochlear implant users in noisy environments: A naturalistic study.人工耳蜗使用者在嘈杂环境中言语的伦巴德效应:一项自然主义研究。
J Acoust Soc Am. 2017 Apr;141(4):2788. doi: 10.1121/1.4979927.
9
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
10
A corpus of audio-visual Lombard speech with frontal and profile views.带有正面和侧面视图的视听伦巴第语语料库。
J Acoust Soc Am. 2018 Jun;143(6):EL523. doi: 10.1121/1.5042758.

本文引用的文献

1
Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm.基于机器学习算法的语音信号特征参数提取改进。
Sensors (Basel). 2022 Oct 24;22(21):8122. doi: 10.3390/s22218122.
2
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.用于说话人识别的 Lombard 效应和低语的分析与校准
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:927-942. doi: 10.1109/taslp.2021.3053388. Epub 2021 Jan 21.
3
Lombard effect, intelligibility, ambient noise, and willingness to spend time and money in a restaurant amongst older adults.
老年人中的朗伯效应、可懂度、环境噪音以及在餐厅花费时间和金钱的意愿。
Sci Rep. 2022 Apr 21;12(1):6549. doi: 10.1038/s41598-022-10414-6.
4
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times.一种用于评估长混响时间空间中非线性处理语音可懂度的新方法。
Sensors (Basel). 2022 Feb 19;22(4):1641. doi: 10.3390/s22041641.
5
COVID-19 Artificial Intelligence Diagnosis Using Only Cough Recordings.仅使用咳嗽录音的COVID-19人工智能诊断
IEEE Open J Eng Med Biol. 2020 Sep 29;1:275-281. doi: 10.1109/OJEMB.2020.3026928. eCollection 2020.
6
Evaluation of aspiration problems in L2 English pronunciation employing machine learning.运用机器学习评估二语英语发音中的发音问题。
J Acoust Soc Am. 2021 Jul;150(1):120. doi: 10.1121/10.0005480.
7
A speech perturbation strategy based on "Lombard effect" for enhanced intelligibility for cochlear implant listeners.一种基于“伦巴德效应”的语音扰动策略,用于提高人工耳蜗聆听者的语音清晰度。
J Acoust Soc Am. 2020 Mar;147(3):1418. doi: 10.1121/10.0000690.
8
A corpus of audio-visual Lombard speech with frontal and profile views.带有正面和侧面视图的视听伦巴第语语料库。
J Acoust Soc Am. 2018 Jun;143(6):EL523. doi: 10.1121/1.5042758.
9
The intelligibility of Lombard speech for non-native listeners.非本地听众对伦巴第语的可理解度。
J Acoust Soc Am. 2012 Aug;132(2):1120-9. doi: 10.1121/1.4732062.
10
The Lombard reflex and its role on human listeners and automatic speech recognizers.伦巴德反射及其在人类听众和自动语音识别器上的作用。
J Acoust Soc Am. 1993 Jan;93(1):510-24. doi: 10.1121/1.405631.