评估预训练卷积神经网络在嵌入式系统上进行音频分类的性能，以实现智能城市中的异常检测。

Evaluating the Performance of Pre-Trained Convolutional Neural Network for Audio Classification on Embedded Systems for Anomaly Detection in Smart Cities.

机构信息

Department of Engineering Sciences and Technology (INDI), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium.

SIGL Laboratory, National School of Applied Sciences of Tetuan, Abdelmalek Essaadi University, Tetuan 93000, Morocco.

出版信息

Sensors (Basel). 2023 Jul 7;23(13):6227. doi: 10.3390/s23136227.

DOI:10.3390/s23136227

PMID:37448075

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10347208/

Abstract

Environmental Sound Recognition (ESR) plays a crucial role in smart cities by accurately categorizing audio using well-trained Machine Learning (ML) classifiers. This application is particularly valuable for cities that analyzed environmental sounds to gain insight and data. However, deploying deep learning (DL) models on resource-constrained embedded devices, such as Raspberry Pi (RPi) or Tensor Processing Units (TPUs), poses challenges. In this work, an evaluation of an existing pre-trained model for deployment on Raspberry Pi (RPi) and TPU platforms other than a laptop is proposed. We explored the impact of the retraining parameters and compared the sound classification performance across three datasets: ESC-10, BDLib, and Urban Sound. Our results demonstrate the effectiveness of the pre-trained model for transfer learning in embedded systems. On laptops, the accuracy rates reached 96.6% for ESC-10, 100% for BDLib, and 99% for Urban Sound. On RPi, the accuracy rates were 96.4% for ESC-10, 100% for BDLib, and 95.3% for Urban Sound, while on RPi with Coral TPU, the rates were 95.7% for ESC-10, 100% for BDLib and 95.4% for the Urban Sound. Utilizing pre-trained models reduces the computational requirements, enabling faster inference. Leveraging pre-trained models in embedded systems accelerates the development, deployment, and performance of various real-time applications.

摘要

环境声音识别 (ESR) 通过使用经过良好训练的机器学习 (ML) 分类器准确地对音频进行分类，在智慧城市中发挥着至关重要的作用。这种应用对于分析环境声音以获取洞察和数据的城市特别有价值。然而，在资源受限的嵌入式设备（如 Raspberry Pi (RPi) 或 Tensor Processing Units (TPU)）上部署深度学习 (DL) 模型存在挑战。在这项工作中，提出了一种在 Raspberry Pi (RPi) 和 TPU 平台上而不是笔记本电脑上部署现有预训练模型的评估方法。我们探讨了重新训练参数的影响，并比较了三个数据集（ESC-10、BDLib 和 Urban Sound）的声音分类性能。我们的结果表明，该预训练模型在嵌入式系统中的迁移学习中是有效的。在笔记本电脑上，ESC-10 的准确率达到 96.6%，BDLib 的准确率达到 100%，Urban Sound 的准确率达到 99%。在 RPi 上，ESC-10 的准确率为 96.4%，BDLib 的准确率为 100%，Urban Sound 的准确率为 95.3%，而在带有 Coral TPU 的 RPi 上，ESC-10 的准确率为 95.7%，BDLib 的准确率为 100%，Urban Sound 的准确率为 95.4%。使用预训练模型可以降低计算要求，实现更快的推断。在嵌入式系统中利用预训练模型可以加速各种实时应用的开发、部署和性能。