Leguy Jules, Cauchy Thomas, Glavatskikh Marta, Duval Béatrice, Da Mota Benoit
Laboratoire LERIA, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045, Angers, France.
Laboratoire MOLTECH-Anjou, UMR CNRS 6200, UNIV Angers, SFR MATRIX, 2 Bd Lavoisier, 49045, Angers, France.
J Cheminform. 2020 Sep 16;12(1):55. doi: 10.1186/s13321-020-00458-z.
The objective of this work is to design a molecular generator capable of exploring known as well as unfamiliar areas of the chemical space. Our method must be flexible to adapt to very different problems. Therefore, it has to be able to work with or without the influence of prior data and knowledge. Moreover, regardless of the success, it should be as interpretable as possible to allow for diagnosis and improvement. We propose here a new open source generation method using an evolutionary algorithm to sequentially build molecular graphs. It is independent of starting data and can generate totally unseen compounds. To be able to search a large part of the chemical space, we define an original set of 7 generic mutations close to the atomic level. Our method achieves excellent performances and even records on the QED, penalised logP, SAscore, CLscore as well as the set of goal-directed functions defined in GuacaMol. To demonstrate its flexibility, we tackle a very different objective issued from the organic molecular materials domain. We show that EvoMol can generate sets of optimised molecules having high energy HOMO or low energy LUMO, starting only from methane. We can also set constraints on a synthesizability score and structural features. Finally, the interpretability of EvoMol allows for the visualisation of its exploration process as a chemically relevant tree.
这项工作的目标是设计一种分子生成器,能够探索化学空间中已知以及未知的区域。我们的方法必须灵活,以适应非常不同的问题。因此,它必须能够在有无先验数据和知识的影响下工作。此外,无论成功与否,它都应该尽可能易于解释,以便进行诊断和改进。我们在此提出一种新的开源生成方法,使用进化算法来顺序构建分子图。它独立于起始数据,能够生成完全未曾见过的化合物。为了能够搜索化学空间的很大一部分,我们定义了一组7种接近原子水平的原始通用突变。我们的方法在QED、惩罚对数P、SA分数、CL分数以及GuacaMol中定义的目标导向函数集上取得了优异的性能甚至记录。为了证明其灵活性,我们处理了一个源自有机分子材料领域的截然不同的目标。我们表明,EvoMol仅从甲烷开始就能生成具有高能量HOMO或低能量LUMO的优化分子集。我们还可以对合成性分数和结构特征设置约束。最后,EvoMol的可解释性允许将其探索过程可视化为一棵具有化学相关性的树。