Lim Hyuntae, Jung YounJoon
Department of Chemistry , Seoul National University , Seoul 08826 , Korea . Email:
Chem Sci. 2019 Aug 20;10(36):8306-8315. doi: 10.1039/c9sc02452b. eCollection 2019 Sep 28.
Prediction of aqueous solubilities or hydration free energies is an extensively studied area in machine learning applications in chemistry since water is the sole solvent in the living system. However, for non-aqueous solutions, few machine learning studies have been undertaken so far despite the fact that the solvation mechanism plays an important role in various chemical reactions. Here, we introduce (deep learning model for solvation free energies in generic organic solvents), which is a novel, machine-learning-based QSPR method which predicts solvation free energies for various organic solute and solvent systems. A novelty of Delfos involves two separate solvent and solute encoder networks that can quantify structural features of given compounds word embedding and recurrent layers, augmented with the attention mechanism which extracts important substructures from outputs of recurrent neural networks. As a result, the predictor network calculates the solvation free energy of a given solvent-solute pair using features from encoders. With the results obtained from extensive calculations using 2495 solute-solvent pairs, we demonstrate that Delfos not only has great potential in showing accuracy comparable to that of the state-of-the-art computational chemistry methods, but also offers information about which substructures play a dominant role in the solvation process.
由于水是生命系统中的唯一溶剂,因此在化学领域的机器学习应用中,对水溶性或水合自由能的预测是一个被广泛研究的领域。然而,对于非水溶液,尽管溶剂化机制在各种化学反应中起着重要作用,但到目前为止,很少有机器学习研究。在这里,我们介绍了Delfos(通用有机溶剂中溶剂化自由能的深度学习模型),这是一种新颖的、基于机器学习的定量构效关系(QSPR)方法,可预测各种有机溶质和溶剂体系的溶剂化自由能。Delfos的新颖之处在于两个独立的溶剂和溶质编码器网络,它们可以通过词嵌入和循环层量化给定化合物的结构特征,并通过注意力机制进行增强,该机制从循环神经网络的输出中提取重要子结构。结果,预测器网络使用来自编码器的特征计算给定溶剂 - 溶质对的溶剂化自由能。通过使用2495个溶质 - 溶剂对进行广泛计算得到的结果,我们证明Delfos不仅在显示与最先进的计算化学方法相当的准确性方面具有巨大潜力,而且还提供了有关哪些子结构在溶剂化过程中起主导作用的信息。