Su Peng, Li Yuhang, Lu Zhonghai, Chen Dejiu
Department of Engineering Design, KTH Royal Institute of Technology, 10044 Stockholm, Sweden.
Thrust of Microelectronics, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511458, China.
Sensors (Basel). 2025 Jul 5;25(13):4196. doi: 10.3390/s25134196.
With the advance of Artificial Intelligence, Deep Neural Networks are widely employed in various sensor-based systems to analyze operational conditions. However, due to the inherently nondeterministic and probabilistic natures of neural networks, the assurance of overall system performance could become a challenging task. In particular, soft errors could weaken the robustness of such networks and thereby threaten the system's safety. Conventional fault-tolerant techniques by means of hardware redundancy and software correction mechanisms often involve a tricky trade-off between effectiveness and scalability in addressing the extensive design space of Deep Neural Networks. In this work, we propose a Reinforcement-Learning-based approach to protect neural networks from soft errors by addressing and identifying the vulnerable bits. The approach consists of three key steps: (1) analyzing layer-wise resiliency of Deep Neural Networks by a fault injection simulation; (2) generating layer-wise bit masks by a Reinforcement-Learning-based agent to reveal the vulnerable bits and to protect against them; and (3) synthesizing and deploying bit masks across the network with guaranteed operation efficiency by adopting transfer learning. As a case study, we select several existing neural networks to test and validate the design. The performance of the proposed approach is compared with the performance of other baseline methods, including Hamming code and the Most Significant Bits protection schemes. The results indicate that the proposed method exhibits a significant improvement. Specifically, we observe that the proposed method achieves a significant performance gain of at least 10% to 15% over on the test network. The results indicate that the proposed method dynamically and efficiently protects the vulnerable bits compared with the baseline methods.
随着人工智能的发展,深度神经网络被广泛应用于各种基于传感器的系统中,以分析运行状况。然而,由于神经网络固有的不确定性和概率性,确保整个系统的性能可能成为一项具有挑战性的任务。特别是,软错误可能会削弱此类网络的鲁棒性,从而威胁系统安全。通过硬件冗余和软件校正机制的传统容错技术在解决深度神经网络广泛的设计空间时,往往在有效性和可扩展性之间面临棘手的权衡。在这项工作中,我们提出了一种基于强化学习的方法,通过处理和识别易受影响的位来保护神经网络免受软错误影响。该方法包括三个关键步骤:(1)通过故障注入模拟分析深度神经网络的逐层弹性;(2)由基于强化学习的智能体生成逐层位掩码,以揭示易受影响的位并加以保护;(3)通过采用迁移学习,在保证运行效率的情况下跨网络合成并部署位掩码。作为案例研究,我们选择了几个现有的神经网络进行测试和验证设计。将所提方法的性能与其他基线方法(包括汉明码和最高有效位保护方案)的性能进行比较。结果表明,所提方法有显著改进。具体而言,我们观察到所提方法在测试网络上比其他方法实现了至少10%至15%的显著性能提升。结果表明,与基线方法相比,所提方法能够动态且高效地保护易受影响的位。