• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用神经网络从语音中捕捉时间动态以进行手术口罩检测。

Capturing Time Dynamics From Speech Using Neural Networks for Surgical Mask Detection.

作者信息

Liu Shuo, Mallol-Ragolta Adria, Yan Tianhao, Qian Kun, Parada-Cabaleiro Emilia, Hu Bin, Schuller Bjorn W

出版信息

IEEE J Biomed Health Inform. 2022 Aug;26(8):4291-4302. doi: 10.1109/JBHI.2022.3173128. Epub 2022 Aug 11.

DOI:10.1109/JBHI.2022.3173128
PMID:35522639
Abstract

The importance of detecting whether a person wears a face mask while speaking has tremendously increased since the outbreak of SARS-CoV-2 (COVID-19), as wearing a mask can help to reduce the spread of the virus and mitigate the public health crisis. Besides affecting human speech characteristics related to frequency, face masks cause temporal interferences in speech, altering the pace, rhythm, and pronunciation speed. In this regard, this paper presents two effective neural network models to detect surgical masks from audio. The proposed architectures are both based on Convolutional Neural Networks (CNNs), chosen as an optimal approach for the spatial processing of the audio signals. One architecture applies a Long Short-Term Memory (LSTM) network to model the time-dependencies. Through an additional attention mechanism, the LSTM-based architecture enables the extraction of more salient temporal information. The other architecture (named ConvTx) retrieves the relative position of a sequence through the positional encoder of a transformer module. In order to assess to which extent both architectures can complement each other when modelling temporal dynamics, we also explore the combination of LSTM and Transformers in three hybrid models. Finally, we also investigate whether data augmentation techniques, such as, using transitions between audio frames and considering gender-dependent frameworks might impact the performance of the proposed architectures. Our experimental results show that one of the hybrid models achieves the best performance, surpassing existing state-of-the-art results for the task at hand.

摘要

自严重急性呼吸综合征冠状病毒2(SARS-CoV-2,即新冠病毒)疫情爆发以来,检测人们在说话时是否佩戴口罩变得极为重要,因为佩戴口罩有助于减少病毒传播并缓解公共卫生危机。除了影响与频率相关的人类语音特征外,口罩还会在语音中造成时间干扰,改变语速、节奏和发音速度。在这方面,本文提出了两种有效的神经网络模型,用于从音频中检测外科口罩。所提出的架构均基于卷积神经网络(CNN),CNN被选为音频信号空间处理的最佳方法。一种架构应用长短期记忆(LSTM)网络来对时间依赖性进行建模。通过额外的注意力机制,基于LSTM的架构能够提取更显著的时间信息。另一种架构(名为ConvTx)通过变压器模块的位置编码器检索序列的相对位置。为了评估在对时间动态进行建模时这两种架构在多大程度上可以相互补充,我们还在三种混合模型中探索了LSTM和Transformer的组合。最后,我们还研究了数据增强技术,例如使用音频帧之间的过渡以及考虑性别相关框架,是否会影响所提出架构的性能。我们的实验结果表明,其中一种混合模型取得了最佳性能,超过了当前该任务的现有最先进结果。

相似文献

1
Capturing Time Dynamics From Speech Using Neural Networks for Surgical Mask Detection.使用神经网络从语音中捕捉时间动态以进行手术口罩检测。
IEEE J Biomed Health Inform. 2022 Aug;26(8):4291-4302. doi: 10.1109/JBHI.2022.3173128. Epub 2022 Aug 11.
2
A novel hybrid face mask detection approach using Transformer and convolutional neural network models.一种使用Transformer和卷积神经网络模型的新型混合口罩检测方法。
PeerJ Comput Sci. 2023 Mar 27;9:e1265. doi: 10.7717/peerj-cs.1265. eCollection 2023.
3
Toward Realigning Automatic Speaker Verification in the Era of COVID-19.面向新冠疫情时代的自动说话人验证技术的再调整。
Sensors (Basel). 2022 Mar 30;22(7):2638. doi: 10.3390/s22072638.
4
Face masks and speaking style affect audio-visual word recognition and memory of native and non-native speech.口罩和说话方式会影响母语和非母语语音的视听词识别和记忆。
J Acoust Soc Am. 2021 Jun;149(6):4013. doi: 10.1121/10.0005191.
5
End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.端到端使用深度神经网络进行多模态临床抑郁症识别:比较分析。
Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.
6
Transformer-based CNNs: Mining Temporal Context Information for Multi-sound COVID-19 Diagnosis.基于 Transformer 的卷积神经网络:挖掘多声源 COVID-19 诊断的时间上下文信息。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2335-2338. doi: 10.1109/EMBC46164.2021.9629552.
7
An Efficient and Effective Deep Learning-Based Model for Real-Time Face Mask Detection.一种用于实时面部口罩检测的高效、有效的基于深度学习的模型。
Sensors (Basel). 2022 Mar 29;22(7):2602. doi: 10.3390/s22072602.
8
Detecting COVID-19 patients via MLES-Net deep learning models from X-Ray images.基于 X 光图像的 MLES-Net 深度学习模型对 COVID-19 患者的检测。
BMC Med Imaging. 2022 Jul 30;22(1):135. doi: 10.1186/s12880-022-00861-y.
9
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
10
Low-cost measurement of face mask efficacy for filtering expelled droplets during speech.低成本测量口罩在说话时过滤呼出飞沫的效果。
Sci Adv. 2020 Sep 2;6(36). doi: 10.1126/sciadv.abd3083. Print 2020 Sep.