Griffiths Ryan-Rhys, Hernández-Lobato José Miguel
Cavendish Laboratory , Department of Physics , University of Cambridge , UK . Email:
Department of Engineering , University of Cambridge , UK . Email:
Chem Sci. 2019 Nov 18;11(2):577-586. doi: 10.1039/c9sc04026a. eCollection 2020 Jan 14.
Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.
自动化学设计是一个用于生成具有优化性质的新型分子的框架。最初的方案以变分自编码器的潜在空间上的贝叶斯优化为特征,存在倾向于产生无效分子结构的问题。首先,我们通过实验证明,当贝叶斯优化方案查询远离变分自编码器所训练数据的潜在空间点时,就会出现这种问题。其次,通过将搜索过程重新表述为一个约束贝叶斯优化问题,我们表明这种问题的影响可以得到缓解,从而使生成分子的有效性得到显著提高。我们认为,在许多涉及变分自编码器潜在空间上的贝叶斯优化的生成任务中,约束贝叶斯优化是解决这种训练集不匹配问题的一种好方法。