Hou Xujia, Zhang Feihu, Gulati Dhiraj, Tan Tingfeng, Zhang Wei
School of Marine Science and Technology, Northwestern Polytechnical University, Xi'An, China.
Siemens EDA, Munich, Germany.
Front Neurorobot. 2023 Oct 26;17:1277160. doi: 10.3389/fnbot.2023.1277160. eCollection 2023.
Common RGBD, CMOS, and CCD-based cameras produce motion blur and incorrect exposure under high-speed and improper lighting conditions. According to the bionic principle, the event camera developed has the advantages of low delay, high dynamic range, and no motion blur. However, due to its unique data representation, it encounters significant obstacles in practical applications. The image reconstruction algorithm based on an event camera solves the problem by converting a series of "events" into common frames to apply existing vision algorithms. Due to the rapid development of neural networks, this field has made significant breakthroughs in past few years. Based on the most popular Events-to-Video (E2VID) method, this study designs a new network called E2VIDX. The proposed network includes group convolution and sub-pixel convolution, which not only achieves better feature fusion but also the network model size is reduced by 25%. Futhermore, we propose a new loss function. The loss function is divided into two parts, first part calculates the high level features and the second part calculates the low level features of the reconstructed image. The experimental results clearly outperform against the state-of-the-art method. Compared with the original method, Structural Similarity (SSIM) increases by 1.3%, Learned Perceptual Image Patch Similarity (LPIPS) decreases by 1.7%, Mean Squared Error (MSE) decreases by 2.5%, and it runs faster on GPU and CPU. Additionally, we evaluate the results of E2VIDX with application to image classification, object detection, and instance segmentation. The experiments show that conversions using our method can help event cameras directly apply existing vision algorithms in most scenarios.
常见的基于RGB-D、CMOS和CCD的相机在高速和光照条件不佳的情况下会产生运动模糊和曝光不正确的问题。根据仿生原理开发的事件相机具有低延迟、高动态范围和无运动模糊的优点。然而,由于其独特的数据表示方式,在实际应用中遇到了重大障碍。基于事件相机的图像重建算法通过将一系列“事件”转换为普通帧来应用现有的视觉算法,从而解决了这个问题。由于神经网络的快速发展,该领域在过去几年取得了重大突破。基于最流行的事件到视频(E2VID)方法,本研究设计了一种名为E2VIDX的新网络。所提出的网络包括分组卷积和子像素卷积,不仅实现了更好的特征融合,而且网络模型大小减少了25%。此外,我们提出了一种新的损失函数。该损失函数分为两部分,第一部分计算重建图像的高层特征,第二部分计算低层特征。实验结果明显优于现有方法。与原始方法相比,结构相似性(SSIM)提高了1.3%,感知图像块相似性(LPIPS)降低了1.7%,均方误差(MSE)降低了2.5%,并且在GPU和CPU上运行速度更快。此外,我们评估了E2VIDX在图像分类、目标检测和实例分割方面的应用结果。实验表明,使用我们的方法进行转换可以帮助事件相机在大多数场景中直接应用现有的视觉算法。