Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
Brookline High School, Brookline, Massachusetts, United States of America.
PLoS Comput Biol. 2022 Feb 25;18(2):e1009888. doi: 10.1371/journal.pcbi.1009888. eCollection 2022 Feb.
A variational autoencoder (VAE) is a machine learning algorithm, useful for generating a compressed and interpretable latent space. These representations have been generated from various biomedical data types and can be used to produce realistic-looking simulated data. However, standard vanilla VAEs suffer from entangled and uninformative latent spaces, which can be mitigated using other types of VAEs such as β-VAE and MMD-VAE. In this project, we evaluated the ability of VAEs to learn cell morphology characteristics derived from cell images. We trained and evaluated these three VAE variants-Vanilla VAE, β-VAE, and MMD-VAE-on cell morphology readouts and explored the generative capacity of each model to predict compound polypharmacology (the interactions of a drug with more than one target) using an approach called latent space arithmetic (LSA). To test the generalizability of the strategy, we also trained these VAEs using gene expression data of the same compound perturbations and found that gene expression provides complementary information. We found that the β-VAE and MMD-VAE disentangle morphology signals and reveal a more interpretable latent space. We reliably simulated morphology and gene expression readouts from certain compounds thereby predicting cell states perturbed with compounds of known polypharmacology. Inferring cell state for specific drug mechanisms could aid researchers in developing and identifying targeted therapeutics and categorizing off-target effects in the future.
变分自动编码器 (VAE) 是一种机器学习算法,可用于生成压缩且可解释的潜在空间。这些表示形式已从各种生物医学数据类型中生成,可用于生成逼真的模拟数据。然而,标准的香草 VAE 存在纠缠和无信息的潜在空间,可以使用其他类型的 VAE(如 β-VAE 和 MMD-VAE)来减轻。在这个项目中,我们评估了 VAE 从细胞图像中学习细胞形态特征的能力。我们训练和评估了这三种 VAE 变体——香草 VAE、β-VAE 和 MMD-VAE——对细胞形态读数的影响,并探索了每个模型的生成能力,以使用称为潜在空间算术 (LSA) 的方法预测化合物多药理学(一种药物与多个靶标相互作用)。为了测试该策略的泛化能力,我们还使用相同化合物扰动的基因表达数据训练了这些 VAE,并发现基因表达提供了补充信息。我们发现,β-VAE 和 MMD-VAE 分离了形态信号,并揭示了一个更具解释性的潜在空间。我们可靠地模拟了某些化合物的形态和基因表达读数,从而预测了已知多药理学化合物对细胞状态的干扰。推断特定药物机制的细胞状态可以帮助研究人员在未来开发和识别靶向治疗药物,并对脱靶效应进行分类。