Zang Zhenya, Wang Quan, Pan Mingliang, Zhang Yuanzhe, Chen Xi, Li Xingda, Li David Day Uei
Department of Biomedical Engineering, University of Strathclyde, Glasgow, United Kingdom.
Department of Biomedical Engineering, University of Strathclyde, Glasgow, United Kingdom.
Comput Methods Programs Biomed. 2025 Jan;258:108471. doi: 10.1016/j.cmpb.2024.108471. Epub 2024 Oct 28.
This study proposes a compact deep learning (DL) architecture and a highly parallelized computing hardware platform to reconstruct the blood flow index (BFi) in diffuse correlation spectroscopy (DCS). We leveraged a rigorous analytical model to generate autocorrelation functions (ACFs) to train the DL network. We assessed the accuracy of the proposed DL using simulated and milk phantom data. Compared to convolutional neural networks (CNN), our lightweight DL architecture achieves 66.7% and 18.5% improvement in MSE for BFi and the coherence factor β, using synthetic data evaluation. The accuracy of rBFi over different algorithms was also investigated. We further simplified the DL computing primitives using subtraction for feature extraction, considering further hardware implementation. We extensively explored computing parallelism and fixed-point quantization within the DL architecture. With the DL model's compact size, we employed unrolling and pipelining optimizations for computation-intensive for-loops in the DL model while storing all learned parameters in on-chip BRAMs. We also achieved pixel-wise parallelism, enabling simultaneous, real-time processing of 10 and 15 autocorrelation functions on Zynq-7000 and Zynq-UltraScale+ field programmable gate array (FPGA), respectively. Unlike existing FPGA accelerators that produce BFi and the β from autocorrelation functions on standalone hardware, our approach is an encapsulated, end-to-end on-chip conversion process from intensity photon data to the temporal intensity ACF and subsequently reconstructing BFi and β. This hardware platform achieves an on-chip solution to replace post-processing and miniaturize modern DCS systems that use single-photon cameras. We also comprehensively compared the computational efficiency of our FPGA accelerator to CPU and GPU solutions.
本研究提出了一种紧凑的深度学习(DL)架构和一个高度并行化的计算硬件平台,用于在扩散相关光谱学(DCS)中重建血流指数(BFi)。我们利用一个严格的分析模型来生成自相关函数(ACF),以训练DL网络。我们使用模拟数据和牛奶仿体数据评估了所提出的DL的准确性。与卷积神经网络(CNN)相比,我们的轻量级DL架构在使用合成数据评估时,BFi的均方误差(MSE)提高了66.7%,相干因子β的MSE提高了18.5%。还研究了不同算法下rBFi的准确性。考虑到进一步的硬件实现,我们使用减法进行特征提取,进一步简化了DL计算原语。我们在DL架构中广泛探索了计算并行性和定点量化。由于DL模型尺寸紧凑,我们对DL模型中计算密集型的for循环采用了展开和流水线优化,同时将所有学习到的参数存储在片上块随机存取存储器(BRAM)中。我们还实现了逐像素并行,分别在Zynq-7000和Zynq-UltraScale+现场可编程门阵列(FPGA)上能够同时实时处理10个和15个自相关函数。与现有的在独立硬件上从自相关函数生成BFi和β的FPGA加速器不同,我们的方法是一个从强度光子数据到时间强度ACF的封装的、端到端的片上转换过程,随后重建BFi和β。这个硬件平台实现了一种片上解决方案,以取代后处理并使使用单光子相机的现代DCS系统小型化。我们还全面比较了我们的FPGA加速器与CPU和GPU解决方案的计算效率。