Jung Shing-Yun, Liao Chia-Hung, Wu Yu-Sheng, Yuan Shyan-Ming, Sun Chuen-Tsai
Department of Computer Science, National Chiao Tung University, Hsinchu 300, Taiwan.
Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
Diagnostics (Basel). 2021 Apr 20;11(4):732. doi: 10.3390/diagnostics11040732.
Lung sounds remain vital in clinical diagnosis as they reveal associations with pulmonary pathologies. With COVID-19 spreading across the world, it has become more pressing for medical professionals to better leverage artificial intelligence for faster and more accurate lung auscultation. This research aims to propose a feature engineering process that extracts the dedicated features for the depthwise separable convolution neural network (DS-CNN) to classify lung sounds accurately and efficiently. We extracted a total of three features for the shrunk DS-CNN model: the short-time Fourier-transformed (STFT) feature, the Mel-frequency cepstrum coefficient (MFCC) feature, and the fused features of these two. We observed that while DS-CNN models trained on either the STFT or the MFCC feature achieved an accuracy of 82.27% and 73.02%, respectively, fusing both features led to a higher accuracy of 85.74%. In addition, our method achieved 16 times higher inference speed on an edge device and only 0.45% less accuracy than RespireNet. This finding indicates that the fusion of the STFT and MFCC features and DS-CNN would be a model design for lightweight edge devices to achieve accurate AI-aided detection of lung diseases.
肺部声音在临床诊断中仍然至关重要,因为它们揭示了与肺部疾病的关联。随着新冠疫情在全球蔓延,医学专业人员更迫切需要更好地利用人工智能来实现更快、更准确的肺部听诊。本研究旨在提出一种特征工程流程,为深度可分离卷积神经网络(DS-CNN)提取专用特征,以准确、高效地对肺部声音进行分类。我们为精简后的DS-CNN模型总共提取了三种特征:短时傅里叶变换(STFT)特征、梅尔频率倒谱系数(MFCC)特征以及这两者的融合特征。我们观察到,虽然在STFT特征或MFCC特征上训练的DS-CNN模型的准确率分别达到了82.27%和73.02%,但融合这两种特征可使准确率提高到85.74%。此外,我们的方法在边缘设备上的推理速度比RespireNet快16倍,且准确率仅低0.45%。这一发现表明,STFT和MFCC特征与DS-CNN的融合将是一种用于轻量级边缘设备的模型设计,以实现准确的人工智能辅助肺部疾病检测。