Pinkas Gadi, Karny Yarden, Malachi Aviad, Barkai Galia, Bachar Gideon, Aharonson Vered
Afeka Center of Language Processing, AfekaTel Aviv Academic College of Engineering Tel Aviv-Yafo 6910717 Israel.
Pediatric Infectious Diseases Unit, Safra Children's Hospital, Sheba Medical Center and Sackler School of MedicineTel-Aviv University Tel Aviv-Yafo 69978 Israel.
IEEE Open J Eng Med Biol. 2020 Sep 24;1:268-274. doi: 10.1109/OJEMB.2020.3026468. eCollection 2020.
Automated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded by the subjects in either hospitals or isolation sites. All subjects underwent nasopharyngeal swabbing at the time of recording and were labelled as SARS-CoV-2 positives or negative controls. The present study harnessed deep machine learning and speech processing to detect the SARS-CoV-2 positives. A three-stage architecture was implemented. A self-supervised attention-based transformer generated embeddings from the audio inputs. Recurrent neural networks were used to produce specialized sub-models for the SARS-CoV-2 classification. An ensemble stacking fused the predictions of the sub-models. Pre-training, bootstrapping and regularization techniques were used to prevent overfitting. A recall of 78% and a probability of false alarm (PFA) of 41% were measured on a test set of 57 recording sessions. A leave-one-speaker-out cross validation on 292 recording sessions yielded a recall of 78% and a PFA of 30%. These preliminary results imply a feasibility for COVID19 screening using voice.
基于自动语音的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)检测有助于新冠病毒病(COVID-19)的筛查。最近收集了一个包含88名受试者手机录音的数据集。该数据集包括受试者在医院或隔离地点自行录制的发声、语音和咳嗽声。所有受试者在录音时均接受了鼻咽拭子检测,并被标记为SARS-CoV-2阳性或阴性对照。本研究利用深度机器学习和语音处理来检测SARS-CoV-2阳性。实施了一个三阶段架构。一个基于自监督注意力的变换器从音频输入中生成嵌入。循环神经网络用于生成用于SARS-CoV-2分类的专门子模型。一个集成堆叠融合了子模型的预测。使用预训练、自助法和正则化技术来防止过拟合。在一个包含57个录音会话的测试集上,召回率为78%,误报概率(PFA)为41%。在292个录音会话上进行的留一说话者交叉验证产生了78%的召回率和30%的PFA。这些初步结果表明使用语音进行COVID-19筛查具有可行性。