Choi Minho, Xiang Jinlin, Wirth-Singh Anna, Baek Seung-Hwan, Shlizerman Eli, Majumdar Arka
Department of Electrical and Computer Engineering, University of Washington, Seattle, 98103, WA, USA.
Department of Physics, University of Washington, Seattle, 98103, WA, USA.
Nat Commun. 2025 Jul 1;16(1):5623. doi: 10.1038/s41467-025-61338-4.
Artificial neural networks have fundamentally transformed the field of computer vision, providing unprecedented performance. However, these neural networks for image processing demand substantial computational resources, often hindering real-time operation. In this work, we demonstrate an optical encoder that can perform convolution simultaneously in three color channels during the image capture, effectively implementing several initial convolutional layers of the network. Such an optical encoding results in ~ 24, 000 × reduction in computational operations, with a state-of-the-art classification accuracy (~73.2%) in free-space optical system. In addition, our analog optical encoder, trained for CIFAR-10 data, can be transferred to the ImageNet subset, High-10, without any modifications, and still exhibits moderate accuracy. The proposed method can decrease total system-level energy more than two orders of magnitude per a single object classification. Our results evidence the potential of hybrid optical/digital computer vision system in which the optical frontend can pre-process an ambient scene to reduce the energy and latency of the whole computer vision system.
人工神经网络从根本上改变了计算机视觉领域,提供了前所未有的性能。然而,这些用于图像处理的神经网络需要大量的计算资源,这常常阻碍实时操作。在这项工作中,我们展示了一种光学编码器,它可以在图像捕获期间在三个颜色通道中同时执行卷积,有效地实现了网络的几个初始卷积层。这种光学编码使计算操作减少了约24000倍,在自由空间光学系统中具有先进的分类精度(约73.2%)。此外,我们针对CIFAR-10数据训练的模拟光学编码器可以在不做任何修改的情况下转移到ImageNet子集High-10上,并且仍然具有中等精度。所提出的方法每进行一次单个对象分类,可将整个系统级能量降低两个以上数量级。我们的结果证明了混合光学/数字计算机视觉系统的潜力,其中光学前端可以对周围场景进行预处理,以降低整个计算机视觉系统的能量和延迟。