Bauer Christoph A, Schneider Gisbert, Göller Andreas H
Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), 8093, Zurich, Switzerland.
Bayer AG, Pharmaceuticals, R&D, 42096, Wuppertal, Germany.
J Cheminform. 2019 Sep 11;11(1):59. doi: 10.1186/s13321-019-0381-4.
We present machine learning (ML) models for hydrogen bond acceptor (HBA) and hydrogen bond donor (HBD) strengths. Quantum chemical (QC) free energies in solution for 1:1 hydrogen-bonded complex formation to the reference molecules 4-fluorophenol and acetone serve as our target values. Our acceptor and donor databases are the largest on record with 4426 and 1036 data points, respectively. After scanning over radial atomic descriptors and ML methods, our final trained HBA and HBD ML models achieve RMSEs of 3.8 kJ mol (acceptors), and 2.3 kJ mol (donors) on experimental test sets, respectively. This performance is comparable with previous models that are trained on experimental hydrogen bonding free energies, indicating that molecular QC data can serve as substitute for experiment. The potential ramifications thereof could lead to a full replacement of wetlab chemistry for HBA/HBD strength determination by QC. As a possible chemical application of our ML models, we highlight our predicted HBA and HBD strengths as possible descriptors in two case studies on trends in intramolecular hydrogen bonding.
我们提出了用于氢键受体(HBA)和氢键供体(HBD)强度的机器学习(ML)模型。与参考分子4-氟苯酚和丙酮形成1:1氢键复合物时在溶液中的量子化学(QC)自由能作为我们的目标值。我们的受体和供体数据库是有记录以来最大的,分别有4426个和1036个数据点。在对径向原子描述符和ML方法进行扫描后,我们最终训练的HBA和HBD ML模型在实验测试集上的均方根误差分别为3.8 kJ/mol(受体)和2.3 kJ/mol(供体)。这种性能与之前基于实验氢键自由能训练的模型相当,表明分子QC数据可以替代实验。其潜在影响可能导致通过QC完全取代用于HBA/HBD强度测定的湿实验室化学方法。作为我们ML模型的一种可能的化学应用,我们在关于分子内氢键趋势的两个案例研究中强调了我们预测的HBA和HBD强度作为可能的描述符。