Suppr超能文献

基于音频的口罩识别:MASC数据库及口罩挑战概述。

Face mask recognition from audio: The MASC database and an overview on the mask challenge.

作者信息

Mohamed Mostafa M, Nessiem Mina A, Batliner Anton, Bergler Christian, Hantke Simone, Schmitt Maximilian, Baird Alice, Mallol-Ragolta Adria, Karas Vincent, Amiriparian Shahin, Schuller Björn W

机构信息

Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany.

AI R&D Team SyncPilot GmbH, Augsburg, Germany.

出版信息

Pattern Recognit. 2022 Feb;122:108361. doi: 10.1016/j.patcog.2021.108361. Epub 2021 Oct 4.

Abstract

The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of . Moreover, we present the results of fusing the approaches, leading to a UAR of . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.

摘要

由于新冠病毒通过身体接触传播以及佩戴口罩的规定,新冠疫情的突然爆发给生物识别领域带来了严峻挑战。考虑到这些限制因素,语音生物识别可以提供一种合适的非接触式生物识别解决方案;它们可以受益于能够对说话者是否佩戴口罩进行分类的模型。本文回顾了2020年国际语音会议计算副语言挑战(ComParE)中的口罩子挑战(MSC),该挑战专注于以下分类任务:给定一段说话者的音频片段,对说话者是否佩戴口罩进行分类。首先,我们报告了奥格斯堡口罩语音语料库(MASC)的收集情况以及用于解决该问题的基线方法,实现了未加权平均召回率(UAR)。然后,我们总结了在已提交和被接受的论文中探索的方法,这些方法主要使用两种常见模式:(i)基于语音的音频特征,或(ii)音频的频谱图表示与通常用于图像处理的卷积神经网络(CNN)相结合。大多数方法通过采用不同模型的集成并尝试使用各种技术增加训练数据的规模来增强其模型。我们回顾并讨论了该子挑战参与者的结果,其中获胜者的UAR得分为 。此外,我们展示了融合这些方法的结果,得出的UAR为 。最后,我们展示了一款智能手机应用程序,它可以用作概念验证演示,以实时检测用户是否佩戴口罩;我们还对最佳模型的运行时间进行了基准测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b713/8489285/2988bb4b2fd5/gr1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验