基于音频输入的肺异常分类的深度集成神经网络与注意力机制

A Deep Ensemble Neural Network with Attention Mechanisms for Lung Abnormality Classification Using Audio Inputs.

机构信息

Department of Computer and Information Sciences, Faculty of Engineering and Environment, University of Northumbria, Newcastle upon Tyne NE1 8ST, UK.

Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK.

出版信息

Sensors (Basel). 2022 Jul 26;22(15):5566. doi: 10.3390/s22155566.

DOI:10.3390/s22155566

PMID:35898070

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9332569/

Abstract

Medical audio classification for lung abnormality diagnosis is a challenging problem owing to comparatively unstructured audio signals present in the respiratory sound clips. To tackle such challenges, we propose an ensemble model by incorporating diverse deep neural networks with attention mechanisms for undertaking lung abnormality and COVID-19 diagnosis using respiratory, speech, and coughing audio inputs. Specifically, four base deep networks are proposed, which include attention-based Convolutional Recurrent Neural Network (A-CRNN), attention-based bidirectional Long Short-Term Memory (A-BiLSTM), attention-based bidirectional Gated Recurrent Unit (A-BiGRU), as well as Convolutional Neural Network (CNN). A Particle Swarm Optimization (PSO) algorithm is used to optimize the training parameters of each network. An ensemble mechanism is used to integrate the outputs of these base networks by averaging the probability predictions of each class. Evaluated using respiratory ICBHI, Coswara breathing, speech, and cough datasets, as well as a combination of ICBHI and Coswara breathing databases, our ensemble model and base networks achieve ICBHI scores ranging from 0.920 to 0.9766. Most importantly, the empirical results indicate that a positive COVID-19 diagnosis can be distinguished to a high degree from other more common respiratory diseases using audio recordings, based on the combined ICBHI and Coswara breathing datasets.

摘要

用于肺部异常诊断的医学音频分类是一个具有挑战性的问题，因为呼吸声样本中存在相对非结构化的音频信号。为了应对这些挑战，我们提出了一种集成模型，该模型通过结合具有注意力机制的多种深度神经网络，使用呼吸、语音和咳嗽音频输入进行肺部异常和 COVID-19 诊断。具体来说，提出了四个基于注意力的基础深度网络，包括基于注意力的卷积递归神经网络 (A-CRNN)、基于注意力的双向长短时记忆网络 (A-BiLSTM)、基于注意力的双向门控循环单元 (A-BiGRU) 以及卷积神经网络 (CNN)。使用粒子群优化 (PSO) 算法优化每个网络的训练参数。通过平均每个类别的概率预测，使用集成机制来整合这些基础网络的输出。通过使用呼吸 ICBHI、Coswara 呼吸、语音和咳嗽数据集以及 ICBHI 和 Coswara 呼吸数据库的组合进行评估，我们的集成模型和基础网络在 ICBHI 评分方面的表现范围为 0.920 到 0.9766。最重要的是，根据 ICBHI 和 Coswara 呼吸数据库的组合，实证结果表明，使用音频记录可以高度区分 COVID-19 阳性诊断与其他更为常见的呼吸道疾病。