Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa.
SAMRC Centre for Tuberculosis Research, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa.
Comput Biol Med. 2022 Feb;141:105153. doi: 10.1016/j.compbiomed.2021.105153. Epub 2021 Dec 17.
We present an experimental investigation into the effectiveness of transfer learning and bottleneck feature extraction in detecting COVID-19 from audio recordings of cough, breath and speech. This type of screening is non-contact, does not require specialist medical expertise or laboratory facilities and can be deployed on inexpensive consumer hardware such as a smartphone. We use datasets that contain cough, sneeze, speech and other noises, but do not contain COVID-19 labels, to pre-train three deep neural networks: a CNN, an LSTM and a Resnet50. These pre-trained networks are subsequently either fine-tuned using smaller datasets of coughing with COVID-19 labels in the process of transfer learning, or are used as bottleneck feature extractors. Results show that a Resnet50 classifier trained by this transfer learning process delivers optimal or near-optimal performance across all datasets achieving areas under the receiver operating characteristic (ROC AUC) of 0.98, 0.94 and 0.92 respectively for all three sound classes: coughs, breaths and speech. This indicates that coughs carry the strongest COVID-19 signature, followed by breath and speech. Our results also show that applying transfer learning and extracting bottleneck features using the larger datasets without COVID-19 labels led not only to improved performance, but also to a marked reduction in the standard deviation of the classifier AUCs measured over the outer folds during nested cross-validation, indicating better generalisation. We conclude that deep transfer learning and bottleneck feature extraction can improve COVID-19 cough, breath and speech audio classification, yielding automatic COVID-19 detection with a better and more consistent overall performance.
我们进行了一项实验研究,旨在探讨迁移学习和瓶颈特征提取在从咳嗽、呼吸和语音的音频记录中检测 COVID-19 方面的有效性。这种筛查是非接触式的,不需要专业医学知识或实验室设施,并且可以在智能手机等廉价的消费类硬件上部署。我们使用包含咳嗽、打喷嚏、语音和其他噪声但不包含 COVID-19 标签的数据集来预训练三个深度神经网络:CNN、LSTM 和 Resnet50。这些预训练的网络随后要么通过迁移学习过程使用包含 COVID-19 标签的较小咳嗽数据集进行微调,要么用作瓶颈特征提取器。结果表明,通过这种迁移学习过程训练的 Resnet50 分类器在所有数据集上都能提供最佳或接近最佳的性能,对于所有三种声音类别(咳嗽、呼吸和语音),其接收者操作特征(ROC AUC)的面积分别为 0.98、0.94 和 0.92。这表明咳嗽携带最强的 COVID-19 特征,其次是呼吸和语音。我们的结果还表明,应用迁移学习并使用没有 COVID-19 标签的较大数据集提取瓶颈特征不仅可以提高性能,还可以明显降低嵌套交叉验证中外层折叠中测量的分类器 AUC 的标准差,表明更好的泛化能力。我们得出结论,深度迁移学习和瓶颈特征提取可以提高 COVID-19 咳嗽、呼吸和语音音频分类的性能,从而实现具有更好和更一致整体性能的自动 COVID-19 检测。