Suppr超能文献

面向新冠疫情时代的自动说话人验证技术的再调整。

Toward Realigning Automatic Speaker Verification in the Era of COVID-19.

机构信息

Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA.

Department of Software Engineering, University of Engineering and Technology, Taxila 47050, Pakistan.

出版信息

Sensors (Basel). 2022 Mar 30;22(7):2638. doi: 10.3390/s22072638.

Abstract

The use of face masks has increased dramatically since the COVID-19 pandemic started in order to to curb the spread of the disease. Additionally, breakthrough infections caused by the Delta and Omicron variants have further increased the importance of wearing a face mask, even for vaccinated individuals. However, the use of face masks also induces attenuation in speech signals, and this change may impact speech processing technologies, e.g., automated speaker verification (ASV) and speech to text conversion. In this paper we examine Automatic Speaker Verification (ASV) systems against the speech samples in the presence of three different types of face mask: surgical, cloth, and filtered N95, and analyze the impact on acoustics and other factors. In addition, we explore the effect of different microphones, and distance from the microphone, and the impact of face masks when speakers use ASV systems in real-world scenarios. Our analysis shows a significant deterioration in performance when an ASV system encounters different face masks, microphones, and variable distance between the subject and microphone. To address this problem, this paper proposes a novel framework to overcome performance degradation in these scenarios by realigning the ASV system. The novelty of the proposed ASV framework is as follows: first, we propose a fused feature descriptor by concatenating the novel Ternary Deviated overlapping Patterns (TDoP), Mel Frequency Cepstral Coefficients (MFCC), and Gammatone Cepstral Coefficients (GTCC), which are used by both the ensemble learning-based ASV and anomaly detection system in the proposed ASV architecture. Second, this paper proposes an anomaly detection model for identifying vocal samples produced in the presence of face masks. Next, it presents a Peak Norm (PN) filter to approximate the signal of the speaker without a face mask in order to boost the accuracy of ASV systems. Finally, the features of filtered samples utilizing the PN filter and samples without face masks are passed to the proposed ASV to test for improved accuracy. The proposed ASV system achieved an accuracy of 0.99 and 0.92, respectively, on samples recorded without a face mask and with different face masks. Although the use of face masks affects the ASV system, the PN filtering solution overcomes this deficiency up to 4%. Similarly, when exposed to different microphones and distances, the PN approach enhanced system accuracy by up to 7% and 9%, respectively. The results demonstrate the effectiveness of the presented framework against an in-house prepared, diverse Multi Speaker Face Masks (MSFM) dataset, (IRB No. FY2021-83), consisting of samples of subjects taken with a variety of face masks and microphones, and from different distances.

摘要

自 COVID-19 大流行开始以来,为了遏制疾病传播,口罩的使用量大幅增加。此外,Delta 和奥密克戎变异株引起的突破性感染进一步增加了戴口罩的重要性,即使是接种过疫苗的人也是如此。然而,口罩的使用也会导致语音信号的衰减,这种变化可能会影响语音处理技术,例如自动说话人验证(ASV)和语音转文本转换。在本文中,我们研究了在存在三种不同类型口罩(手术口罩、布口罩和过滤 N95 口罩)的情况下,对自动说话人验证(ASV)系统的影响,并分析了对声学和其他因素的影响。此外,我们还探索了不同麦克风、麦克风与说话人之间距离的影响,以及说话人在现实场景中使用 ASV 系统时口罩的影响。我们的分析表明,当 ASV 系统遇到不同的口罩、麦克风和说话人与麦克风之间的可变距离时,系统性能会显著恶化。为了解决这个问题,本文提出了一种通过重新调整 ASV 系统来克服这些场景下性能下降的新框架。所提出的 ASV 框架的新颖之处在于:首先,我们提出了一种融合特征描述符,通过连接新颖的三进制偏离重叠模式(TDoP)、梅尔频率倒谱系数(MFCC)和伽马倒谱系数(GTCC),这两个特征被用于基于集成学习的 ASV 和异常检测系统中。其次,本文提出了一种异常检测模型,用于识别在口罩存在下产生的语音样本。接下来,它提出了一种峰值规范(PN)滤波器来近似没有口罩的说话人的信号,以提高 ASV 系统的准确性。最后,将使用 PN 滤波器过滤的样本和没有口罩的样本的特征传递到所提出的 ASV 进行测试,以提高准确性。所提出的 ASV 系统在没有口罩和不同口罩记录的样本上分别达到了 0.99 和 0.92 的准确率。虽然口罩的使用会影响 ASV 系统,但 PN 滤波解决方案可以将这一不足降低 4%。同样,当暴露于不同的麦克风和距离时,PN 方法分别将系统的准确性提高了 7%和 9%。结果表明,该框架在使用不同口罩和麦克风、来自不同距离的多说话人口罩(MSFM)数据集(IRB 编号 FY2021-83)上是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d1e/9003118/39a73f6ae73f/sensors-22-02638-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验