Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1, Building 3, Moscow 119991, Russia.
Science Data Software, LLC, 14909 Forest Landing Circle, Rockville, Maryland 20850, United States.
J Phys Chem Lett. 2021 Sep 30;12(38):9213-9219. doi: 10.1021/acs.jpclett.1c02477. Epub 2021 Sep 16.
The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a data set with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of the models for target properties with a lack of data. The effects of the data set composition on the model's quality and the applicability domain of the resulting models are also considered.
机器学习在化学中的应用已经成为一种常见做法。同时,尽管现代机器学习方法取得了成功,但数据的缺乏限制了它们的使用。使用迁移学习方法可以帮助解决这个问题。该方法假设,在足够数量的数据上构建的模型捕获了在其上进行训练的化学化合物结构的一般特征,并且在缺乏数据的数据集上进一步重用这些特征将极大地提高新模型的质量。在本文中,我们针对小分子开发了这种方法,实现了基于图卷积神经网络的迁移学习。该论文表明,在缺乏数据的情况下,目标属性的模型性能有了显著提高。还考虑了数据集组成对模型质量和所得模型适用域的影响。