Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea.
Interface Materials and Engineering Laboratory, Korea Research Institute of Chemical Technology, Daejeon, 34114, Republic of Korea.
Macromol Rapid Commun. 2024 Aug;45(15):e2400161. doi: 10.1002/marc.202400161. Epub 2024 Jun 18.
Machine learning can be used to predict the properties of polymers and explore vast chemical spaces. However, the limited number of available experimental datasets hinders the enhancement of the predictive performance of a model. This study proposes a machine learning approach that leverages transfer learning and ensemble modeling to efficiently predict the glass transition temperature (T) of fluorinated polymers and guide the design of high T copolymers. Initially, the quantum machine 9 (QM9) dataset is employed for model pretraining, thus providing robust molecular representations for the subsequent fine-tuning of a specialized copolymer dataset. Ensemble modeling is used to further enhance prediction robustness and reliability, effectively addressing the problems owing to the limited and unevenly distributed nature of the copolymer dataset. Finally, a fine-tuned ensemble model is used to navigate a vast chemical space comprising 61 monomers and identify promising candidates for high T fluorinated polymers. The model predicts 247 entries capable of achieving a T over 390 K, of which 14 are experimentally validated. This study demonstrates the potential of machine learning in material design and discovery, highlighting the effectiveness of transfer learning and ensemble modeling strategies for overcoming the challenges posed by small datasets in complex copolymer systems.
机器学习可用于预测聚合物的性质并探索广阔的化学空间。然而,可用的实验数据集数量有限,这限制了模型预测性能的提升。本研究提出了一种机器学习方法,利用迁移学习和集成建模来高效地预测氟化聚合物的玻璃化转变温度(T),并指导高 T 共聚物的设计。首先,我们使用量子机器 9(QM9)数据集对模型进行预训练,从而为后续专门的共聚物数据集的微调提供稳健的分子表示。集成建模用于进一步提高预测的稳健性和可靠性,有效地解决了共聚物数据集有限且分布不均的问题。最后,我们使用经过微调的集成模型来探索由 61 个单体组成的广阔化学空间,并确定有前途的高 T 氟化聚合物候选物。该模型预测了 247 个能够达到 390 K 以上 T 的条目,其中 14 个经过实验验证。本研究展示了机器学习在材料设计和发现中的潜力,强调了迁移学习和集成建模策略在克服复杂共聚物系统中小数据集带来的挑战方面的有效性。