Chen Shiqi, Li Yuhang, Wang Yuntian, Chen Hanlong, Ozcan Aydogan
Electrical and Computer Engineering Department, University of California Los Angeles, Los Angeles, CA, USA.
Bioengineering Department, University of California Los Angeles, Los Angeles, CA, USA.
Nature. 2025 Aug;644(8078):903-911. doi: 10.1038/s41586-025-09446-5. Epub 2025 Aug 27.
Generative models cover various application areas, including image and video synthesis, natural language processing and molecular design, among many others. As digital generative models become larger, scalable inference in a fast and energy-efficient manner becomes a challenge. Here we present optical generative models inspired by diffusion models, where a shallow and fast digital encoder first maps random noise into phase patterns that serve as optical generative seeds for a desired data distribution; a jointly trained free-space-based reconfigurable decoder all-optically processes these generative seeds to create images never seen before following the target data distribution. Except for the illumination power and the random seed generation through a shallow encoder, these optical generative models do not consume computing power during the synthesis of the images. We report the optical generation of monochrome and multicolour images of handwritten digits, fashion products, butterflies, human faces and artworks, following the data distributions of MNIST, Fashion-MNIST, Butterflies-100, Celeb-A datasets, and Van Gogh's paintings and drawings, respectively, achieving an overall performance comparable to digital neural-network-based generative models. To experimentally demonstrate optical generative models, we used visible light to generate images of handwritten digits and fashion products. In addition, we generated Van Gogh-style artworks using both monochrome and multiwavelength illumination. These optical generative models might pave the way for energy-efficient and scalable inference tasks, further exploiting the potentials of optics and photonics for artificial-intelligence-generated content.
生成模型涵盖了各种应用领域,包括图像和视频合成、自然语言处理以及分子设计等等。随着数字生成模型规模的增大,以快速且节能的方式进行可扩展推理成为一项挑战。在此,我们展示受扩散模型启发的光学生成模型,其中一个浅层且快速的数字编码器首先将随机噪声映射为相位图案,这些图案作为期望数据分布的光学生成种子;一个联合训练的基于自由空间的可重构解码器全光处理这些生成种子,以按照目标数据分布创建前所未见的图像。除了照明功率以及通过浅层编码器生成随机种子外,这些光学生成模型在图像合成过程中不消耗计算能力。我们分别报告了按照MNIST、Fashion-MNIST、Butterflies-100、Celeb-A数据集以及梵高的绘画和素描的数据分布,对手写数字、时尚产品、蝴蝶、人脸和艺术品进行单色和多色图像的光学生成,总体性能与基于数字神经网络的生成模型相当。为了通过实验展示光学生成模型,我们使用可见光生成手写数字和时尚产品的图像。此外,我们使用单色和多波长照明生成了梵高风格的艺术品。这些光学生成模型可能为节能和可扩展推理任务铺平道路,进一步挖掘光学和光子学在人工智能生成内容方面的潜力。