基于 FPGA 的量化卷积神经网络的低延迟原位图像分析。

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network.

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2853-2866. doi: 10.1109/TNNLS.2020.3046452. Epub 2022 Jul 6.

DOI:10.1109/TNNLS.2020.3046452

Abstract

Real-time in situ image analytics impose stringent latency requirements on intelligent neural network inference operations. While conventional software-based implementations on the graphic processing unit (GPU)-accelerated platforms are flexible and have achieved very high inference throughput, they are not suitable for latency-sensitive applications where real-time feedback is needed. Here, we demonstrate that high-performance reconfigurable computing platforms based on field-programmable gate array (FPGA) processing can successfully bridge the gap between low-level hardware processing and high-level intelligent image analytics algorithm deployment within a unified system. The proposed design performs inference operations on a stream of individual images as they are produced and has a deeply pipelined hardware design that allows all layers of a quantized convolutional neural network (QCNN) to compute concurrently with partial image inputs. Using the case of label-free classification of human peripheral blood mononuclear cell (PBMC) subtypes as a proof-of-concept illustration, our system achieves an ultralow classification latency of 34.2 [Formula: see text] with over 95% end-to-end accuracy by using a QCNN, while the cells are imaged at throughput exceeding 29 200 cells/s. Our QCNN design is modular and is readily adaptable to other QCNNs with different latency and resource requirements.

摘要

实时原位图像分析对智能神经网络推断操作提出了严格的延迟要求。虽然基于图形处理单元 (GPU) 加速平台的传统软件实现具有灵活性，并实现了非常高的推断吞吐量，但它们不适合需要实时反馈的延迟敏感应用程序。在这里，我们证明了基于现场可编程门阵列 (FPGA) 处理的高性能可重构计算平台可以成功弥合低级别硬件处理和高级智能图像分析算法在统一系统内部署之间的差距。所提出的设计在逐个生成图像的流上执行推断操作，并且具有深度流水线硬件设计，允许量化卷积神经网络 (QCNN) 的所有层与部分图像输入同时计算。使用无标记分类人类外周血单核细胞 (PBMC) 亚型的情况作为概念验证说明，我们的系统通过使用 QCNN 实现了超低的分类延迟 34.2 [公式：见正文]，同时超过 95%的端到端准确性，而细胞的成像速度超过 29200 个/秒。我们的 QCNN 设计是模块化的，并且可以轻松适应其他具有不同延迟和资源要求的 QCNN。

相似文献

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network.基于 FPGA 的量化卷积神经网络的低延迟原位图像分析。

IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2853-2866. doi: 10.1109/TNNLS.2020.3046452. Epub 2022 Jul 6.

FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.基于 FPGA 的量化神经网络混合式实现及其在遥感中的应用。

Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs.深度卷积神经网络在 FPGAs 上的全栈加速。

IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3974-3987. doi: 10.1109/TNNLS.2021.3055240. Epub 2022 Aug 3.

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform.EDSSA：基于OpenCL的FPGA平台上的编解码器语义分割网络加速器。

Sensors (Basel). 2020 Jul 17;20(14):3969. doi: 10.3390/s20143969.

Hardware Implementations of a Deep Learning Approach to Optimal Configuration of Reconfigurable Intelligence Surfaces.一种用于可重构智能表面最优配置的深度学习方法的硬件实现

Sensors (Basel). 2024 Jan 30;24(3):899. doi: 10.3390/s24030899.

FPGA-based neural network accelerators for millimeter-wave radio-over-fiber systems.用于毫米波光纤无线系统的基于现场可编程门阵列的神经网络加速器

Opt Express. 2020 Apr 27;28(9):13384-13400. doi: 10.1364/OE.391050.

Design of Fully Spectral CNNs for Efficient FPGA-Based Acceleration.用于基于现场可编程门阵列（FPGA）的高效加速的全谱卷积神经网络（CNN）设计

IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):8111-8123. doi: 10.1109/TNNLS.2022.3224779. Epub 2024 Jun 3.

A Quantized CNN-Based Microfluidic Lensless-Sensing Mobile Blood-Acquisition and Analysis System.基于量化卷积神经网络的微流控无镜头式移动血液采集与分析系统。

Sensors (Basel). 2019 Nov 21;19(23):5103. doi: 10.3390/s19235103.

Real-time data analysis for medical diagnosis using FPGA-accelerated neural networks.使用 FPGA 加速神经网络进行医疗诊断的实时数据分析。

BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):490. doi: 10.1186/s12859-018-2505-7.

A fully-mapped and energy-efficient FPGA accelerator for dual-function AI-based analysis of ECG.一种用于基于人工智能的心电图双功能分析的全映射且节能的现场可编程门阵列（FPGA）加速器。

Front Physiol. 2023 Feb 6;14:1079503. doi: 10.3389/fphys.2023.1079503. eCollection 2023.

引用本文的文献

Information-Distilled Generative Label-Free Morphological Profiling Encodes Cellular Heterogeneity.信息提取的生成式无标记形态分析编码细胞异质性。

Adv Sci (Weinh). 2024 Aug;11(29):e2307591. doi: 10.1002/advs.202307591. Epub 2024 Jun 12.

piRT-IFC: Physics-informed real-time impedance flow cytometry for the characterization of cellular intrinsic electrical properties.piRT-IFC：基于物理信息的实时阻抗流式细胞术用于表征细胞固有电学特性

Microsyst Nanoeng. 2023 Jun 8;9:77. doi: 10.1038/s41378-023-00545-9. eCollection 2023.

BCNet: A Deep Learning Computer-Aided Diagnosis Framework for Human Peripheral Blood Cell Identification.BCNet：一种用于人类外周血细胞识别的深度学习计算机辅助诊断框架。

Diagnostics (Basel). 2022 Nov 16;12(11):2815. doi: 10.3390/diagnostics12112815.

Label-free imaging flow cytometry for analysis and sorting of enzymatically dissociated tissues.无标记成像流式细胞术用于分析和分选酶解组织。

Sci Rep. 2022 Jan 19;12(1):963. doi: 10.1038/s41598-022-05007-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 FPGA 的量化卷积神经网络的低延迟原位图像分析。

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献