Materials Science and Engineering Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States.
Materials Measurement Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States.
ACS Macro Lett. 2022 Sep 20;11(9):1117-1122. doi: 10.1021/acsmacrolett.2c00369. Epub 2022 Aug 26.
The application of machine learning to the materials domain has traditionally struggled with two major challenges: a lack of large, curated data sets and the need to understand the physics behind the machine-learning prediction. The former problem is particularly acute in the polymers domain. Here we aim to simultaneously tackle these challenges through the incorporation of scientific knowledge, thus, providing improved predictions for smaller data sets, both under interpolation and extrapolation, and a degree of explainability. We focus on imperfect theories, as they are often readily available and easier to interpret. Using a system of a polymer in different solvent qualities, we explore numerous methods for incorporating theory into machine learning using different machine-learning models, including Gaussian process regression. Ultimately, we find that encoding the functional form of the theory performs best followed by an encoding of the numeric values of the theory.
缺乏大型、经过精心整理的数据集,以及需要理解机器学习预测背后的物理原理。前一个问题在聚合物领域尤为突出。在这里,我们旨在通过结合科学知识来同时解决这些挑战,从而在插值和外推的情况下,为较小的数据集提供更好的预测,并提供一定程度的可解释性。我们专注于不完善的理论,因为它们通常更容易获得和解释。我们使用不同溶剂质量的聚合物系统,探索了使用不同机器学习模型(包括高斯过程回归)将理论纳入机器学习的多种方法。最终,我们发现,对理论的函数形式进行编码的效果最好,其次是对理论的数值进行编码。