Peng Peng, Jiang Kai, You Mingyu, Xie Jialin, Zhou Hongjun, Xu Weisheng, Lu Jicheng, Li Xiayu, Xu Yun
IEEE Trans Biomed Circuits Syst. 2023 Feb;17(1):116-128. doi: 10.1109/TBCAS.2023.3236976. Epub 2023 Mar 30.
Precisely and automatically detecting the cough sound is of vital clinical importance. Nevertheless, due to privacy protection considerations, transmitting the raw audio data to the cloud is not permitted, and therefore there is a great demand for an efficient, accurate, and low-cost solution at the edge device. To address this challenge, we propose a semi-custom software-hardware co-design methodology to help build the cough detection system. Specifically, we first design a scalable and compact convolutional neural network (CNN) structure that generates many network instances. Second, we develop a dedicated hardware accelerator to perform the inference computation efficiently, and then we find the optimal network instance by applying network design space exploration. Finally, we compile the optimal network and let it run on the hardware accelerator. The experimental results demonstrate that our model achieves 88.8% classification accuracy, 91.2% sensitivity, 86.5% specificity, and 86.5% precision, while the computation complexity is only 1.09 M multiply-accumulation (MAC). Additionally, when implemented on a lightweight field programmable gate array (FPGA), the complete cough detection system only occupies 7.9 K lookup tables (LUTs), 12.9 K flip-flops (FFs), and 41 digital signal processing (DSP) slices, providing 8.3 GOP/s actual inference throughput and total power dissipation of 0.93 W. This framework meets the needs of partial application and can be easily extended or integrated into other healthcare applications.
精确且自动地检测咳嗽声具有至关重要的临床意义。然而,出于隐私保护的考虑,不允许将原始音频数据传输到云端,因此在边缘设备上迫切需要一种高效、准确且低成本的解决方案。为应对这一挑战,我们提出了一种半定制的软硬件协同设计方法来帮助构建咳嗽检测系统。具体而言,我们首先设计了一种可扩展且紧凑的卷积神经网络(CNN)结构,该结构可生成多个网络实例。其次,我们开发了一种专用硬件加速器以高效执行推理计算,然后通过应用网络设计空间探索找到最优网络实例。最后,我们编译最优网络并让其在硬件加速器上运行。实验结果表明,我们的模型实现了88.8%的分类准确率、91.2%的灵敏度、86.5%的特异性和86.5%的精确率,而计算复杂度仅为1.09 M乘法累加(MAC)。此外,当在轻量级现场可编程门阵列(FPGA)上实现时,完整的咳嗽检测系统仅占用7.9 K查找表(LUT)、12.9 K触发器(FF)和41个数字信号处理(DSP)切片,提供8.3 GOP/s的实际推理吞吐量,总功耗为0.93 W。该框架满足部分应用的需求,并且可以轻松扩展或集成到其他医疗保健应用中。