Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, Shanghai Tech University, Shanghai, 201210, China.
Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, Information Science and Technology, Shanghai Tech University, Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
Adv Sci (Weinh). 2024 Oct;11(40):e2403998. doi: 10.1002/advs.202403998. Epub 2024 Aug 29.
The molecular representation model is a neural network that converts molecular representations (SMILES, Graph) into feature vectors, and is an essential module applied across a wide range of artificial intelligence-driven drug discovery scenarios. However, current molecular representation models rarely consider the three-dimensional conformational space of molecules, losing sight of the dynamic nature of small molecules as well as the essence of molecular conformational space that covers the heterogeneity of molecule properties, such as the multi-target mechanism of action, recognition of different biomolecules, dynamics in cytoplasm and membrane. In this study, a new model named GeminiMol is proposed to incorporate conformational space profiles into molecular representation learning, which extracts the feature of capturing the complicated interplay between the molecular structure and the conformational space. Although GeminiMol is pre-trained on a relatively small-scale molecular dataset (39290 molecules), it shows balanced and superior performance not only on 67 molecular properties predictions but also on 73 cellular activity predictions and 171 zero-shot tasks (including virtual screening and target identification). By capturing the molecular conformational space profile, the strategy paves the way for rapid exploration of chemical space and facilitates changing paradigms for drug design.
分子表示模型是一种神经网络,它将分子表示(SMILES、图)转换为特征向量,是广泛应用于人工智能驱动的药物发现场景的重要模块。然而,当前的分子表示模型很少考虑分子的三维构象空间,忽略了小分子的动态性质以及涵盖分子性质异质性的分子构象空间的本质,例如多靶作用机制、不同生物分子的识别、细胞质和膜中的动力学。在这项研究中,提出了一种名为 GeminiMol 的新模型,将构象空间分布纳入分子表示学习中,该模型提取了捕捉分子结构和构象空间之间复杂相互作用的特征。尽管 GeminiMol 是在相对较小的分子数据集(39290 个分子)上进行预训练的,但它不仅在 67 种分子性质预测方面表现出平衡和优越的性能,而且在 73 种细胞活性预测和 171 种零样本任务(包括虚拟筛选和靶标识别)中也表现出平衡和优越的性能。通过捕捉分子构象空间分布,该策略为快速探索化学空间铺平了道路,并为药物设计改变范式提供了可能。