Global Institute of Future Technology, Shanghai Jiaotong University University, Shanghai, 200240, China.
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510000, China.
Adv Sci (Weinh). 2024 Aug;11(32):e2404845. doi: 10.1002/advs.202404845. Epub 2024 Jun 21.
Constructing discriminative representations of molecules lies at the core of a number of domains such as drug discovery, chemistry, and medicine. State-of-the-art methods employ graph neural networks and self-supervised learning (SSL) to learn unlabeled data for structural representations, which can then be fine-tuned for downstream tasks. Albeit powerful, these methods are pre-trained solely on molecular structures and thus often struggle with tasks involved in intricate biological processes. Here, it is proposed to assist the learning of molecular representation by using the perturbed high-content cell microscopy images at the phenotypic level. To incorporate the cross-modal pre-training, a unified framework is constructed to align them through multiple types of contrastive loss functions, which is proven effective in the formulated novel tasks to retrieve the molecules and corresponding images mutually. More importantly, the model can infer functional molecules according to cellular images generated by genetic perturbations. In parallel, the proposed model can transfer non-trivially to molecular property predictions, and has shown great improvement over clinical outcome predictions. These results suggest that such cross-modality learning can bridge molecules and phenotype to play important roles in drug discovery.
构建分子的判别表示是药物发现、化学和医学等多个领域的核心。最先进的方法使用图神经网络和自监督学习 (SSL) 来学习无标签数据的结构表示,然后可以对其进行微调以用于下游任务。尽管这些方法功能强大,但它们仅在分子结构上进行预训练,因此通常难以处理涉及复杂生物过程的任务。在这里,建议通过使用表型水平的受扰高内涵细胞显微镜图像来辅助分子表示的学习。为了进行跨模态预训练,构建了一个统一的框架,通过多种类型的对比损失函数对其进行对齐,这在制定的新型任务中被证明是有效的,可以相互检索分子和相应的图像。更重要的是,该模型可以根据遗传扰动产生的细胞图像推断出功能分子。同时,所提出的模型可以非平凡地转移到分子性质预测中,并在临床结果预测方面显示出了很大的改进。这些结果表明,这种跨模态学习可以连接分子和表型,在药物发现中发挥重要作用。