Hung Albert, Zhang Charles J, Sexton Jonathan Z, O'Meara Matthew J, Welch Joshua D
Department of Computer Science and Engineering, University of Michigan, Ann Arbor, USA.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA.
bioRxiv. 2024 Nov 8:2024.11.06.622339. doi: 10.1101/2024.11.06.622339.
The rapid advancement of high-content, single-cell technologies like robotic confocal microscopy with multiplexed dyes (morphological profiling) can be leveraged to reveal fundamental biology, ranging from microbial and abiotic stress to organ development. Specifically, heterogeneous cell systems can be perturbed genetically or with chemical treatments to allow for inference of causal mechanisms. An exciting strategy to navigate the high-dimensional space of possible perturbation and cell type combinations is to use generative models as priors to anticipate high-content outcomes in order to design informative experiments. Towards this goal, we present the Latent diffUsion for Multiplexed Images of Cells (LUMIC) framework that can generate high quality and high fidelity images of cells. LUMIC combines diffusion models with DINO (self-Distillation with NO labels), a vision-transformer based, self-supervised method that can be trained on images to learn feature embeddings, and HGraph2Graph, a hierarchical graph encoder-decoder to represent chemicals. To demonstrate the ability of LUMIC to generalize across cell lines and treatments, we apply it to a dataset of ~27,000 images of two cell lines treated with 306 chemicals and stained with three dyes from the JUMP Pilot dataset and a newly-generated dataset of ~3,000 images of five cell lines treated with 61 chemicals and stained with three dyes. To quantify prediction quality, we evaluate the DINO embeddings, Kernel Inception Distance (KID) score, and recovery of morphological feature distributions. LUMIC significantly outperforms previous methods and generates realistic out-of-sample images of cells across unseen compounds and cell types.
诸如配备多色染料的机器人共聚焦显微镜(形态学分析)等高内涵单细胞技术的快速发展,可用于揭示从微生物和非生物胁迫到器官发育等基础生物学过程。具体而言,可通过基因手段或化学处理对异质细胞系统进行扰动,以推断因果机制。一种用于探索可能的扰动和细胞类型组合的高维空间的令人兴奋的策略是,使用生成模型作为先验来预测高内涵结果,从而设计出有信息量的实验。为实现这一目标,我们提出了细胞多重图像的潜在扩散(LUMIC)框架,该框架能够生成高质量、高保真的细胞图像。LUMIC将扩散模型与DINO(无标签自蒸馏)相结合,DINO是一种基于视觉变换器的自监督方法,可在图像上进行训练以学习特征嵌入,还与HGraph2Graph相结合,HGraph2Graph是一种用于表示化学物质的分层图编码器-解码器。为证明LUMIC在不同细胞系和处理条件下的泛化能力,我们将其应用于来自JUMP试点数据集的约27000张图像的数据集,该数据集包含用306种化学物质处理并使用三种染料染色的两种细胞系,以及一个新生成的约3000张图像的数据集,该数据集包含用61种化学物质处理并使用三种染料染色的五种细胞系。为量化预测质量,我们评估了DINO嵌入、核起始距离(KID)分数以及形态特征分布的恢复情况。LUMIC显著优于先前的方法,并能生成跨未见化合物和细胞类型的逼真的样本外细胞图像。