Yan Chaochao, Yang Jinyu, Ma Hehuan, Wang Sheng, Huang Junzhou
Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
J Comput Biol. 2023 Jan;30(1):82-94. doi: 10.1089/cmb.2022.0063. Epub 2022 Aug 16.
Molecule generation is the procedure to generate initial novel molecule proposals for molecule design. Molecules are first projected into continuous vectors in chemical latent space, and then, these embedding vectors are decoded into molecules under the variational autoencoder (VAE) framework. The continuous latent space of VAE can be utilized to generate novel molecules with desired chemical properties and further optimize the desired chemical properties of molecules. However, there is a posterior collapse problem with the conventional recurrent neural network-based VAEs for the molecule sequence generation, which deteriorates the generation performance. We investigate the posterior collapse problem and find that the underestimated reconstruction loss is the main factor in the posterior collapse problem in molecule sequence generation. To support our conclusion, we present both analytical and experimental evidence. What is more, we propose an efficient and effective solution to fix the problem and prevent posterior collapse. As a result, our method achieves competitive reconstruction accuracy and validity score on the benchmark data sets.
分子生成是为分子设计生成初始新颖分子提议的过程。分子首先在化学潜在空间中投影到连续向量,然后,这些嵌入向量在变分自编码器(VAE)框架下解码为分子。VAE的连续潜在空间可用于生成具有所需化学性质的新颖分子,并进一步优化分子的所需化学性质。然而,用于分子序列生成的传统基于循环神经网络的VAE存在后验坍塌问题,这会降低生成性能。我们研究了后验坍塌问题,发现重建损失被低估是分子序列生成中后验坍塌问题的主要因素。为了支持我们的结论,我们提供了分析和实验证据。此外,我们提出了一种有效且高效的解决方案来解决该问题并防止后验坍塌。结果,我们的方法在基准数据集上实现了有竞争力的重建准确性和有效性得分。