Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 60 5th Avenue, New York, New York, 10011, United States.
Center for Data Science, New York University, 60 5th Avenue, New York, New York, 10011, United States.
Sci Rep. 2019 Dec 31;9(1):20381. doi: 10.1038/s41598-019-56773-5.
A molecule's geometry, also known as conformation, is one of a molecule's most important properties, determining the reactions it participates in, the bonds it forms, and the interactions it has with other molecules. Conventional conformation generation methods minimize hand-designed molecular force field energy functions that are often not well correlated with the true energy function of a molecule observed in nature. They generate geometrically diverse sets of conformations, some of which are very similar to the lowest-energy conformations and others of which are very different. In this paper, we propose a conditional deep generative graph neural network that learns an energy function by directly learning to generate molecular conformations that are energetically favorable and more likely to be observed experimentally in data-driven manner. On three large-scale datasets containing small molecules, we show that our method generates a set of conformations that on average is far more likely to be close to the corresponding reference conformations than are those obtained from conventional force field methods. Our method maintains geometrical diversity by generating conformations that are not too similar to each other, and is also computationally faster. We also show that our method can be used to provide initial coordinates for conventional force field methods. On one of the evaluated datasets we show that this combination allows us to combine the best of both methods, yielding generated conformations that are on average close to reference conformations with some very similar to reference conformations.
分子的几何形状,也称为构象,是分子最重要的性质之一,决定了它参与的反应、形成的键以及与其他分子的相互作用。传统的构象生成方法最小化了手设计的分子力场能量函数,这些能量函数通常与自然界中观察到的分子的真实能量函数相关性较差。它们生成了具有不同几何形状的构象集,其中一些与最低能量构象非常相似,而另一些则非常不同。在本文中,我们提出了一种条件深度生成图神经网络,它通过直接学习生成能量有利且更有可能在数据驱动的方式下在实验中观察到的分子构象来学习能量函数。在三个包含小分子的大规模数据集上,我们表明我们的方法生成的构象集比传统力场方法获得的构象集更有可能接近相应的参考构象,而且我们的方法通过生成彼此之间不太相似的构象来保持几何多样性,同时计算速度也更快。我们还表明,我们的方法可用于为传统力场方法提供初始坐标。在评估的数据集之一上,我们表明这种组合可以使我们结合两种方法的优点,生成的构象平均接近参考构象,并且有些构象非常接近参考构象。