Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:998-1001. doi: 10.1109/EMBC48229.2022.9871125.
This work focuses on the automatic detection of COVID-19 from the analysis of vocal sounds, including sustained vowels, coughs, and speech while reading a short text. Specifically, we use the Mel-spectrogram representations of these acoustic signals to train neural network-based models for the task at hand. The extraction of deep learnt representations from the Mel-spectrograms is performed with Convolutional Neural Networks (CNNs). In an attempt to guide the training of the embedded representations towards more separable and robust inter-class representations, we explore the use of a triplet loss function. The experiments performed are conducted using the Your Voice Counts dataset, a new dataset containing German speakers collected using smartphones. The results obtained support the suitability of using triplet loss-based models to detect COVID-19 from vocal sounds. The best Unweighted Average Recall (UAR) of 66.5 % is obtained using a triplet loss-based model exploiting vocal sounds recorded while reading.
这项工作专注于通过分析声音(包括持续元音、咳嗽声和朗读短文时的语音)自动检测 COVID-19。具体来说,我们使用这些声学信号的梅尔频谱图表示来训练基于神经网络的模型来完成这项任务。梅尔频谱图的深度学习表示是通过卷积神经网络(CNN)提取的。为了尝试引导嵌入表示的训练朝着更可分离和稳健的类间表示,我们探索了使用三元组损失函数。所进行的实验使用了 Your Voice Counts 数据集,这是一个包含使用智能手机收集的德国演讲者的新数据集。所获得的结果支持使用基于三元组损失的模型从声音中检测 COVID-19 的适用性。使用基于三元组损失的模型从朗读时录制的声音中获得了最佳的未加权平均召回率(UAR)为 66.5%。