Tibo Alessandro, He Jiazhen, Janet Jon Paul, Nittinger Eva, Engkvist Ola
Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D AstraZeneca, Gothenburg, Sweden.
Nat Commun. 2024 Aug 25;15(1):7315. doi: 10.1038/s41467-024-51672-4.
How many near-neighbors does a molecule have? This fundamental question in chemistry is crucial for molecular optimization problems under the similarity principle assumption. Generative models can sample molecules from a vast chemical space but lack explicit knowledge about molecular similarity. Therefore, these models need guidance from reinforcement learning to sample a relevant similar chemical space. However, they still miss a mechanism to measure the coverage of a specific region of the chemical space. To overcome these limitations, a source-target molecular transformer model, regularized via a similarity kernel function, is proposed. Trained on a largest dataset of ≥200 billion molecular pairs, the model enforces a direct relationship between generating a target molecule and its similarity to a source molecule. Results indicate that the regularization term significantly improves the correlation between generation probability and molecular similarity, enabling exhaustive exploration of molecule near-neighborhoods.
一个分子有多少近邻?化学中的这个基本问题对于相似性原理假设下的分子优化问题至关重要。生成模型可以从广阔的化学空间中采样分子,但缺乏关于分子相似性的明确知识。因此,这些模型需要强化学习的指导来采样相关的相似化学空间。然而,它们仍然缺少一种机制来测量化学空间特定区域的覆盖范围。为了克服这些限制,提出了一种通过相似性核函数进行正则化的源 - 目标分子变压器模型。该模型在一个≥2000亿对分子的最大数据集上进行训练,在生成目标分子与其与源分子的相似性之间建立了直接关系。结果表明,正则化项显著提高了生成概率与分子相似性之间的相关性,能够对分子近邻进行详尽探索。