使用具有原位训练能力的片上衍射光学器件的多模态深度学习。

Multimodal deep learning using on-chip diffractive optics with in situ training capability.

作者信息

Cheng Junwei, Huang Chaoran, Zhang Jialong, Wu Bo, Zhang Wenkai, Liu Xinyu, Zhang Jiahui, Tang Yiyi, Zhou Hailong, Zhang Qiming, Gu Min, Dong Jianji, Zhang Xinliang

机构信息

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China.

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.

出版信息

Nat Commun. 2024 Jul 23;15(1):6189. doi: 10.1038/s41467-024-50677-3.

DOI:10.1038/s41467-024-50677-3

PMID:39043669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11266606/

Abstract

Multimodal deep learning plays a pivotal role in supporting the processing and learning of diverse data types within the realm of artificial intelligence generated content (AIGC). However, most photonic neuromorphic processors for deep learning can only handle a single data modality (either vision or audio) due to the lack of abundant parameter training in optical domain. Here, we propose and demonstrate a trainable diffractive optical neural network (TDONN) chip based on on-chip diffractive optics with massive tunable elements to address these constraints. The TDONN chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain the inference results without frequent optical-electrical conversion. The customized stochastic gradient descent algorithm and the drop-out mechanism are developed for photonic neurons to realize in situ training and fast convergence in the optical domain. The TDONN chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip has successfully implemented four-class classification in different modalities (vision, audio, and touch) and achieve 85.7% accuracy on multimodal test sets. Our work opens up a new avenue for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.

摘要

多模态深度学习在支持人工智能生成内容（AIGC）领域中多种数据类型的处理和学习方面发挥着关键作用。然而，由于在光学领域缺乏丰富的参数训练，大多数用于深度学习的光子神经形态处理器只能处理单一数据模态（视觉或音频）。在此，我们提出并展示了一种基于具有大量可调元件的片上衍射光学的可训练衍射光学神经网络（TDONN）芯片，以解决这些限制。TDONN芯片包括一个输入层、五个隐藏层和一个输出层，只需一次前向传播即可获得推理结果，无需频繁的光电转换。为光子神经元开发了定制的随机梯度下降算法和随机失活机制，以实现在光学领域的原位训练和快速收敛。TDONN芯片实现了每秒217.6万亿次运算（TOPS）的潜在吞吐量，具有高计算密度（447.7 TOPS/mm）、高系统级能效（7.28 TOPS/W）和低光学延迟（30.2 ps）。TDONN芯片已成功在不同模态（视觉、音频和触觉）中实现了四类分类，并在多模态测试集上达到了85.7%的准确率。我们的工作为使用集成光子处理器的多模态深度学习开辟了一条新途径，为使用光子技术的低功耗人工智能大模型提供了一种潜在的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3019/11266606/bfae5751343a/41467_2024_50677_Fig1_HTML.jpg

相似文献

Multimodal deep learning using on-chip diffractive optics with in situ training capability.

Nat Commun. 2024 Jul 23;15(1):6189. doi: 10.1038/s41467-024-50677-3.

Deep convolutional neural network and IoT technology for healthcare.

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence.

Science. 2024 Apr 12;384(6692):202-209. doi: 10.1126/science.adl1203. Epub 2024 Apr 11.

Photonic machine learning with on-chip diffractive optics.

Nat Commun. 2023 Jan 5;14(1):70. doi: 10.1038/s41467-022-35772-7.

All-optical graph representation learning using integrated diffractive photonic computing units.

Sci Adv. 2022 Jun 17;8(24):eabn7630. doi: 10.1126/sciadv.abn7630. Epub 2022 Jun 15.

An on-chip photonic deep neural network for image classification.

Nature. 2022 Jun;606(7914):501-506. doi: 10.1038/s41586-022-04714-0. Epub 2022 Jun 1.

Nonlinear germanium-silicon photodiode for activation and monitoring in photonic neuromorphic networks.

Nat Commun. 2022 Oct 13;13(1):6048. doi: 10.1038/s41467-022-33877-7.

All-analog photoelectronic chip for high-speed vision tasks.

Nature. 2023 Nov;623(7985):48-57. doi: 10.1038/s41586-023-06558-8. Epub 2023 Oct 25.

Only-train-electrical-to-optical-conversion (OTEOC): simple diffractive neural networks with optical readout.

Opt Express. 2022 Jul 18;30(15):28024-28037. doi: 10.1364/OE.462370.

Simulating an Integrated Photonic Image Classifier for Diffractive Neural Networks.

Micromachines (Basel). 2023 Dec 26;15(1):0. doi: 10.3390/mi15010050.

引用本文的文献

Multi-wavelength diffractive optical neural network integrated with 2D photonic crystals for joint optical classification.

Nanophotonics. 2025 Jul 8;14(17):2891-2899. doi: 10.1515/nanoph-2025-0168. eCollection 2025 Aug.

Highly efficient photonic convolver via lossless mode-division fan-in.

Nat Commun. 2025 Aug 13;16(1):7513. doi: 10.1038/s41467-025-62954-w.

Advanced Design for High-Performance and AI Chips.

Nanomicro Lett. 2025 Jul 29;18(1):13. doi: 10.1007/s40820-025-01850-w.

Near-Sensor Edge Computing System Enabled by a CMOS Compatible Photonic Integrated Circuit Platform Using Bilayer AlN/Si Waveguides.

Nanomicro Lett. 2025 May 19;17(1):261. doi: 10.1007/s40820-025-01743-y.

Flex multimode neural network for complete optical computation.

iScience. 2025 Apr 8;28(5):112376. doi: 10.1016/j.isci.2025.112376. eCollection 2025 May 16.

Complex-valued matrix-vector multiplication using a scalable coherent photonic processor.

Sci Adv. 2025 Apr 4;11(14):eads7475. doi: 10.1126/sciadv.ads7475.

本文引用的文献

The Impact of Multimodal Large Language Models on Health Care's Future.

J Med Internet Res. 2023 Nov 2;25:e52865. doi: 10.2196/52865.

ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model.

Int J Oral Sci. 2023 Jul 28;15(1):29. doi: 10.1038/s41368-023-00239-y.

Foundation models for generalist medical artificial intelligence.

Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.

Co-packaged optics (CPO): status, challenges, and solutions.

Front Optoelectron. 2023 Mar 20;16(1):1. doi: 10.1007/s12200-022-00055-y.

Broadband physical layer cognitive radio with an integrated photonic processor for blind source separation.

Nat Commun. 2023 Feb 27;14(1):1107. doi: 10.1038/s41467-023-36814-4.

Circuit-level convergence of electronics and photonics: basic concepts and recent advances.

Front Optoelectron. 2022 Apr 28;15(1):16. doi: 10.1007/s12200-022-00013-8.

A small microring array that performs large complex-valued matrix-vector multiplication.

Front Optoelectron. 2022 Apr 28;15(1):15. doi: 10.1007/s12200-022-00009-4.

Photonic machine learning with on-chip diffractive optics.

Nat Commun. 2023 Jan 5;14(1):70. doi: 10.1038/s41467-022-35772-7.

Microcomb-based integrated photonic processing unit.

Nat Commun. 2023 Jan 5;14(1):66. doi: 10.1038/s41467-022-35506-9.

High-order tensor flow processing using integrated photonic circuits.

Nat Commun. 2022 Dec 28;13(1):7970. doi: 10.1038/s41467-022-35723-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用具有原位训练能力的片上衍射光学器件的多模态深度学习。

Multimodal deep learning using on-chip diffractive optics with in situ training capability.

作者信息

Cheng Junwei, Huang Chaoran, Zhang Jialong, Wu Bo, Zhang Wenkai, Liu Xinyu, Zhang Jiahui, Tang Yiyi, Zhou Hailong, Zhang Qiming, Gu Min, Dong Jianji, Zhang Xinliang

机构信息

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, 430074, China.

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.

出版信息

Nat Commun. 2024 Jul 23;15(1):6189. doi: 10.1038/s41467-024-50677-3.

DOI:10.1038/s41467-024-50677-3

PMID:39043669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11266606/

Abstract

摘要

使用具有原位训练能力的片上衍射光学器件的多模态深度学习。

Multimodal deep learning using on-chip diffractive optics with in situ training capability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用具有原位训练能力的片上衍射光学器件的多模态深度学习。

Multimodal deep learning using on-chip diffractive optics with in situ training capability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献