Kondratyev Vladimir, Dryzhakov Marian, Gimadiev Timur, Slutskiy Dmitriy
Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240, Stains, France.
Telecom Paris, 19 Place Marguerite Perey, CS 20031, 91123, Palaiseau, France.
J Cheminform. 2023 Feb 2;15(1):11. doi: 10.1186/s13321-023-00681-4.
In this work, we provide further development of the junction tree variational autoencoder (JT VAE) architecture in terms of implementation and application of the internal feature space of the model. Pretraining of JT VAE on a large dataset and further optimization with a regression model led to a latent space that can solve several tasks simultaneously: prediction, generation, and optimization. We use the ZINC database as a source of molecules for the JT VAE pretraining and the QM9 dataset with its HOMO values to show the application case. We evaluate our model on multiple tasks such as property (value) prediction, generation of new molecules with predefined properties, and structure modification toward the property. Across these tasks, our model shows improvements in generation and optimization tasks while preserving the precision of state-of-the-art models.
在这项工作中,我们在模型内部特征空间的实现和应用方面进一步发展了联合树变分自编码器(JT VAE)架构。在大型数据集上对JT VAE进行预训练,并使用回归模型进行进一步优化,从而得到一个能够同时解决多个任务的潜在空间:预测、生成和优化。我们使用ZINC数据库作为JT VAE预训练的分子来源,并使用具有最高已占分子轨道(HOMO)值的QM9数据集来展示应用案例。我们在多个任务上评估我们的模型,如性质(值)预测、生成具有预定义性质的新分子以及针对性质的结构修改。在这些任务中,我们的模型在生成和优化任务方面表现出改进,同时保持了最先进模型的精度。