Bayer AG, 13353 Berlin, Germany.
Clinic for Phoniatrics, Pedaudiology and Communication Disorders, University Hospital of RWTH Aachen, 52074 Aachen, Germany.
Sensors (Basel). 2024 Sep 24;24(19):6176. doi: 10.3390/s24196176.
Audio-based classification techniques for body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of coughs as the main acoustic biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on the coronavirus disease 2019 (COVID-19) have suggested that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes fused audio instance and representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conduct experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an area under the receiver operating characteristic curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation. While this study focuses on COVID-19, FAIR's flexibility allows it to combine various multi-modal and multi-instance features in many other diagnostic applications, potentially leading to more accurate diagnoses across a wider range of diseases.
基于音频的身体声音分类技术长期以来一直被研究用于辅助呼吸道疾病的诊断。虽然大多数研究都集中在使用咳嗽作为主要声学生物标志物上,但其他身体声音也有可能检测到呼吸道疾病。最近关于 2019 年冠状病毒病(COVID-19)的研究表明,呼吸和语音声音除了咳嗽外,与疾病也有关联。我们的研究提出了融合音频实例和表示(FAIR)作为一种呼吸疾病检测方法。FAIR 依赖于从以波形和频谱图形式表示的各种身体声音中构建联合特征向量。我们在 COVID-19 检测用例中进行了实验,结合了身体声音的波形和频谱图表示。我们的研究结果表明,使用自注意力来组合从咳嗽、呼吸和语音声音中提取的特征可以获得最佳性能,接收器操作特征曲线(AUC)下的面积分数为 0.8658,灵敏度为 0.8057,特异性为 0.7958。与仅在频谱图或波形上训练的模型相比,同时使用两种表示形式会提高 AUC 分数,这表明结合频谱图和波形表示有助于丰富提取的特征,并优于仅使用一种表示形式的模型。虽然本研究重点是 COVID-19,但 FAIR 的灵活性允许它在许多其他诊断应用程序中结合各种多模态和多实例特征,有可能在更广泛的疾病范围内实现更准确的诊断。