Ranmal Dakshina, Ranasinghe Piumini, Paranayapa Thivindu, Meedeniya Dulani, Perera Charith
Department of Computer Science & Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka.
School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK.
Sensors (Basel). 2024 Jun 9;24(12):3749. doi: 10.3390/s24123749.
The combination of deep-learning and IoT plays a significant role in modern smart solutions, providing the capability of handling task-specific real-time offline operations with improved accuracy and minimised resource consumption. This study provides a novel hardware-aware neural architecture search approach called ESC-NAS, to design and develop deep convolutional neural network architectures specifically tailored for handling raw audio inputs in environmental sound classification applications under limited computational resources. The ESC-NAS process consists of a novel cell-based neural architecture search space built with 2D convolution, batch normalization, and max pooling layers, and capable of extracting features from raw audio. A black-box Bayesian optimization search strategy explores the search space and the resulting model architectures are evaluated through hardware simulation. The models obtained from the ESC-NAS process achieved the optimal trade-off between model performance and resource consumption compared to the existing literature. The ESC-NAS models achieved accuracies of 85.78%, 81.25%, 96.25%, and 81.0% for the FSC22, UrbanSound8K, ESC-10, and ESC-50 datasets, respectively, with optimal model sizes and parameter counts for edge deployment.
深度学习与物联网的结合在现代智能解决方案中发挥着重要作用,具备处理特定任务实时离线操作的能力,可提高准确性并将资源消耗降至最低。本研究提出了一种名为ESC-NAS的新型硬件感知神经架构搜索方法,用于设计和开发深度卷积神经网络架构,该架构专门针对在有限计算资源下的环境声音分类应用中处理原始音频输入进行了定制。ESC-NAS过程包括一个基于单元的新型神经架构搜索空间,该空间由二维卷积、批量归一化和最大池化层构建而成,能够从原始音频中提取特征。一种黑盒贝叶斯优化搜索策略探索该搜索空间,并通过硬件仿真对所得模型架构进行评估。与现有文献相比,从ESC-NAS过程中获得的模型在模型性能和资源消耗之间实现了最佳平衡。对于FSC22、UrbanSound8K、ESC-10和ESC-50数据集,ESC-NAS模型分别实现了85.78%、81.25%、96.25%和81.0%的准确率,同时具有适用于边缘部署的最佳模型大小和参数数量。