U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States.
Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States.
J Chem Inf Model. 2022 Nov 28;62(22):5397-5410. doi: 10.1021/acs.jcim.2c00841. Epub 2022 Oct 14.
For many experimentally measured chemical properties that cannot be directly computed from first-principles, the existing physics-based models do not extrapolate well to out-of-sample molecules, and experimental datasets themselves are too small for traditional machine learning (ML) approaches. To overcome these limitations, we apply a transfer learning approach, whereby we simultaneously train a multi-target regression model on a small number of molecules with experimentally measured values and a large number of molecules with related computed properties. We demonstrate this methodology on predicting the experimentally measured impact sensitivity of energetic crystals, finding that both characteristics of the computed dataset and model architecture are important to prediction accuracy of the small experimental dataset. Our directed-message passing neural network (D-MPNN) ML model using transfer learning outperforms direct-ML and physics-based models on a diverse test set, and the new methods described here are widely applicable to modeling many other structure-property relationships.
对于许多无法直接从第一性原理计算得出的实验测量化学性质,现有的基于物理的模型在外推到样本外分子时效果不佳,而实验数据集本身对于传统的机器学习(ML)方法来说太小。为了克服这些限制,我们应用了一种转移学习方法,即在具有实验测量值的少数分子和具有相关计算性质的大量分子上同时训练多目标回归模型。我们在预测高能晶体的实验测量冲击感度方面展示了这种方法,发现计算数据集的特征和模型架构对于小实验数据集的预测准确性都很重要。我们使用转移学习的定向消息传递神经网络(D-MPNN)ML 模型在多样化的测试集上优于直接 ML 和基于物理的模型,并且这里描述的新方法广泛适用于模拟许多其他结构-性质关系。