Department of Computer Science and Electrical Engineering, University of Missouri-Kansas City, Kansas City, MO 64110, USA.
Sensors (Basel). 2022 Feb 16;22(4):1521. doi: 10.3390/s22041521.
Lung or heart sound classification is challenging due to the complex nature of audio data, its dynamic properties of time, and frequency domains. It is also very difficult to detect lung or heart conditions with small amounts of data or unbalanced and high noise in data. Furthermore, the quality of data is a considerable pitfall for improving the performance of deep learning. In this paper, we propose a novel feature-based fusion network called FDC-FS for classifying heart and lung sounds. The FDC-FS framework aims to effectively transfer learning from three different deep neural network models built from audio datasets. The innovation of the proposed transfer learning relies on the transformation from audio data to image vectors and from three specific models to one fused model that would be more suitable for deep learning. We used two publicly available datasets for this study, i.e., lung sound data from ICHBI 2017 challenge and heart challenge data. We applied data augmentation techniques, such as noise distortion, pitch shift, and time stretching, dealing with some data issues in these datasets. Importantly, we extracted three unique features from the audio samples, i.e., Spectrogram, MFCC, and Chromagram. Finally, we built a fusion of three optimal convolutional neural network models by feeding the image feature vectors transformed from audio features. We confirmed the superiority of the proposed fusion model compared to the state-of-the-art works. The highest accuracy we achieved with FDC-FS is 99.1% with Spectrogram-based lung sound classification while 97% for Spectrogram and Chromagram based heart sound classification.
由于音频数据的复杂性质、时频域的动态特性,肺部或心脏声音的分类具有挑战性。此外,在数据量少、数据不平衡和噪声高的情况下,很难检测到肺部或心脏状况。而且,数据质量是提高深度学习性能的一个相当大的陷阱。在本文中,我们提出了一种名为 FDC-FS 的基于特征的融合网络,用于对心肺声音进行分类。FDC-FS 框架旨在从三个不同的基于音频数据集构建的深度神经网络模型中有效地进行迁移学习。所提出的迁移学习的创新之处在于将音频数据转换为图像向量,并将三个特定模型转换为一个更适合深度学习的融合模型。我们使用了两个公开的数据集进行这项研究,即 2017 年 ICHBI 挑战赛的肺部声音数据和心脏挑战数据。我们应用了数据增强技术,如噪声失真、音高移位和时间拉伸,以解决这些数据集中的一些数据问题。重要的是,我们从音频样本中提取了三个独特的特征,即声谱图、梅尔频率倒谱系数和色度图。最后,我们通过输入从音频特征转换而来的图像特征向量来构建三个最佳卷积神经网络模型的融合。我们证实了所提出的融合模型优于最先进的方法。我们在基于声谱图的肺部声音分类方面取得了最高的准确率 99.1%,而在基于声谱图和色度图的心脏声音分类方面则达到了 97%。