Singh Ajay Pal, Nigam Ankita, Garg Gaurav
Department of Computer Science and Engineering, Mahakaushal University, Jabalpur-482003, India.
Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India.
Curr Med Imaging. 2025;21:e15734056388107. doi: 10.2174/0115734056388107250710120917.
Driven by environmental pollution and the rise in infectious diseases, the increasing prevalence of lung conditions demands advancements in diagnostic techniques.
This study explores the use of various features, such as spectrograms, chromograms, and Mel Frequency Cepstral Coefficients (MFCC), to extract crucial information from auscultation recordings. It addresses challenges through filter-based audio enhancement methods. The primary goal is to improve disease detection accuracy by leveraging convolutional neural networks (CNNs) for feature extraction and dense neural networks for classification.
While deep learning models like CNNs and Recurrent Neural Network (RNN) outperform traditional machine learning models such as Sequence Vector Machine, K-Nearest Neighbours (KNN) and random forest with accuracies ranging from 70% to 85%. The combination of CNN, RNN, and long short-term memory achieved an accuracy of 88%. By integrating MFCC, Chroma Short-Term Fourier Transform (STFT), and spectrogram features with a CNN-based classifier, the proposed multi-feature deep learning model achieved the highest accuracy of 92%, surpassing all other methods.
The study effectively addresses key issues, including the overrepresentation of Chronic Obstructive Pulmonary Disease (COPD) samples over Lower Respiratory Tract Infections (LRTI) and Upper Respiratory Tract Infections (URTI) which hampers generalization across test audio samples.
The proposed methodology caters common challenges like background noise in recordings, and the limited and imbalanced nature of datasets. These findings pave the way for enhanced clinical applications, showcasing the transformative potential of multi-feature deep learning methods in the classification of pulmonary diseases.
受环境污染和传染病增加的驱动,肺部疾病患病率的上升要求诊断技术取得进步。
本研究探索使用各种特征,如图谱、色谱和梅尔频率倒谱系数(MFCC),从听诊记录中提取关键信息。它通过基于滤波器的音频增强方法应对挑战。主要目标是通过利用卷积神经网络(CNN)进行特征提取和密集神经网络进行分类来提高疾病检测准确率。
虽然像CNN和循环神经网络(RNN)这样的深度学习模型优于传统机器学习模型,如序列向量机、K近邻(KNN)和随机森林,准确率在70%到85%之间。CNN、RNN和长短期记忆的组合达到了88%的准确率。通过将MFCC、色度短时傅里叶变换(STFT)和图谱特征与基于CNN的分类器相结合,所提出的多特征深度学习模型达到了92%的最高准确率,超过了所有其他方法。
该研究有效解决了关键问题,包括慢性阻塞性肺疾病(COPD)样本相对于下呼吸道感染(LRTI)和上呼吸道感染(URTI)的过度代表性,这阻碍了对测试音频样本的泛化。
所提出的方法应对了录音中的背景噪声以及数据集有限和不均衡等常见挑战。这些发现为增强临床应用铺平了道路,展示了多特征深度学习方法在肺部疾病分类中的变革潜力。