Bhandari Binayak
Department of Railroad Engineering & Transport Management, Woosong University, Daejeon 300718, Korea.
Micromachines (Basel). 2021 Nov 29;12(12):1484. doi: 10.3390/mi12121484.
This study compared popular Deep Learning (DL) architectures to classify machining surface roughness using sound and force data. The DL architectures considered in this study include Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN), Long Short-Term Memory (LSTM), and transformer. The classification was performed on the sound and force data generated during machining aluminum sheets for different levels of spindle speed, feed rate, depth of cut, and end-mill diameter, and it was trained on 30 s machining data (10-40 s) of the machining experiments. Since a raw audio waveform is seldom used in DL models, Mel-Spectrogram and Mel Frequency Cepstral Coefficients (MFCCs) audio feature extraction techniques were used in the DL models. The results of DL models were compared for the training-validation accuracy, training epochs, and training parameters of each model. Although the roughness classification by all the DL models was satisfactory (except for CNN with Mel-Spectrogram), the transformer-based modes had the highest training (>96%) and validation accuracies (≈90%). The CNN model with Mel-Spectrogram exhibited the worst training and inference accuracy, which is influenced by limited training data. Confusion matrices were plotted to observe the classification accuracy visually. The confusion matrices showed that the transformer model trained on Mel-Spectrogram and the transformer model trained on MFCCs correctly predicted 366 (or 91.5%) and 371 (or 92.7%) out of 400 test samples. This study also highlights the suitability and superiority of the transformer model for time series sound and force data and over other DL models.
本研究比较了流行的深度学习(DL)架构,以利用声音和力数据对加工表面粗糙度进行分类。本研究中考虑的DL架构包括多层感知器(MLP)、卷积神经网络(CNN)、长短期记忆网络(LSTM)和变换器。分类是基于在加工铝板过程中针对不同主轴转速、进给速度、切削深度和立铣刀直径所生成的声音和力数据进行的,并且在加工实验的30秒加工数据(10 - 40秒)上进行训练。由于原始音频波形很少在DL模型中使用,因此在DL模型中使用了梅尔频谱图和梅尔频率倒谱系数(MFCC)音频特征提取技术。比较了DL模型在每个模型的训练验证准确率、训练轮次和训练参数方面的结果。尽管所有DL模型的粗糙度分类结果都令人满意(使用梅尔频谱图的CNN模型除外),但基于变换器的模型具有最高的训练准确率(>96%)和验证准确率(≈90%)。使用梅尔频谱图的CNN模型表现出最差的训练和推理准确率,这受到有限训练数据的影响。绘制混淆矩阵以直观地观察分类准确率。混淆矩阵显示,在梅尔频谱图上训练的变换器模型和在MFCC上训练的变换器模型在400个测试样本中分别正确预测了366个(或91.5%)和371个(或92.7%)。本研究还强调了变换器模型对于时间序列声音和力数据的适用性和优越性,以及相对于其他DL模型的优势。