Worldwide MedChem, Pfizer Worldwide Research and Development, 445 Eastern Point Road, Groton, Connecticut 06340, USA.
J Comput Chem. 2013 Jul 15;34(19):1661-71. doi: 10.1002/jcc.23308. Epub 2013 May 7.
We introduce a class of partial atomic charge assignment method that provides ab initio quality description of the electrostatics of bioorganic molecules. The method uses a set of models that neither have a fixed functional form nor require a fixed set of parameters, and therefore are capable of capturing the complexities of the charge distribution in great detail. Random Forest regression is used to build separate charge models for elements H, C, N, O, F, S, and Cl, using training data consisting of partial charges along with a description of their surrounding chemical environments; training set charges are generated by fitting to the b3lyp/6-31G* electrostatic potential (ESP) and are subsequently refined to improve consistency and transferability of the charge assignments. Using a set of 210 neutral, small organic molecules, the absolute hydration free energy calculated using these charges in conjunction with Generalized Born solvation model shows a low mean unsigned error, close to 1 kcal/mol, from the experimental data. Using another large and independent test set of chemically diverse organic molecules, the method is shown to accurately reproduce charge-dependent observables--ESP and dipole moment--from ab initio calculations. The method presented here automatically provides an estimate of potential errors in the charge assignment, enabling systematic improvement of these models using additional data. This work has implications not only for the future development of charge models but also in developing methods to describe many other chemical properties that require accurate representation of the electronic structure of the system.
我们介绍了一类部分原子电荷赋值方法,该方法能够对生物有机分子的静电性质进行从头算质量描述。该方法使用了一组既没有固定函数形式也不需要固定参数集的模型,因此能够非常详细地捕捉电荷分布的复杂性。随机森林回归用于为 H、C、N、O、F、S 和 Cl 元素建立单独的电荷模型,使用的训练数据包括部分电荷及其周围化学环境的描述;训练集电荷是通过拟合 b3lyp/6-31G*静电势(ESP)生成的,并进行了进一步的细化,以提高电荷分配的一致性和可转移性。使用 210 个中性、小分子有机分子的数据集,使用这些电荷结合广义 Born 溶剂化模型计算得到的绝对水合自由能与实验数据的平均绝对误差接近 1 kcal/mol。使用另一个化学多样性较大的独立有机分子测试集,该方法能够准确再现从头算计算中的电荷相关可观测值——ESP 和偶极矩。本文提出的方法自动提供了对电荷赋值中潜在误差的估计,从而能够使用更多数据来系统地改进这些模型。这项工作不仅对电荷模型的未来发展具有重要意义,而且对开发需要准确表示系统电子结构的许多其他化学性质的描述方法也具有重要意义。