Cheng Junwei, Huang Chaoran, Zhang Jialong, Wu Bo, Zhang Wenkai, Liu Xinyu, Zhang Jiahui, Tang Yiyi, Zhou Hailong, Zhang Qiming, Gu Min, Dong Jianji, Zhang Xinliang
Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China.
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.
Nat Commun. 2024 Jul 23;15(1):6189. doi: 10.1038/s41467-024-50677-3.
Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision or audio) due to the lack of abundant parameter training in optical domain. Here, we propose and demonstrate a trainable diffractive optical neural network (TDONN) chip based on on-chip diffractive optics with massive tunable elements to address these constraints. The TDONN chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain the inference results without frequent optical-electrical conversion. The customized stochastic gradient descent algorithm and the drop-out mechanism are developed for photonic neurons to realize in situ training and fast convergence in the optical domain. The TDONN chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip has successfully implemented four-class classification in different modalities (vision, audio, and touch) and achieve 85.7% accuracy on multimodal test sets. Our work opens up a new avenue for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.
多模态深度学习在支持人工智能生成内容(AIGC)领域中多种数据类型的处理和学习方面发挥着关键作用。然而,由于在光学领域缺乏丰富的参数训练,大多数用于深度学习的光子神经形态处理器只能处理单一数据模态(视觉或音频)。在此,我们提出并展示了一种基于具有大量可调元件的片上衍射光学的可训练衍射光学神经网络(TDONN)芯片,以解决这些限制。TDONN芯片包括一个输入层、五个隐藏层和一个输出层,只需一次前向传播即可获得推理结果,无需频繁的光电转换。为光子神经元开发了定制的随机梯度下降算法和随机失活机制,以实现在光学领域的原位训练和快速收敛。TDONN芯片实现了每秒217.6万亿次运算(TOPS)的潜在吞吐量,具有高计算密度(447.7 TOPS/mm)、高系统级能效(7.28 TOPS/W)和低光学延迟(30.2 ps)。TDONN芯片已成功在不同模态(视觉、音频和触觉)中实现了四类分类,并在多模态测试集上达到了85.7%的准确率。我们的工作为使用集成光子处理器的多模态深度学习开辟了一条新途径,为使用光子技术的低功耗人工智能大模型提供了一种潜在的解决方案。