Huo Wenjia, Liang Boyang, Wu Xiang, Zhang Zhenchang, Zhou Weichao, Wang Haihong, Ran Xupeng, Bai Yaoyao, Zheng Rongrong
School of Petrochemical Engineering, Shenyang University of Technology, Liaoyang 111003, China.
Polymers (Basel). 2025 Jul 30;17(15):2083. doi: 10.3390/polym17152083.
The utilization of machine learning (ML) has brought more opportunities for the discovery of high-performance materials with specific properties to replace traditional engineering materials. The glass transition temperature (T) is a crucial characteristic of polyimide (PI). But small datasets can only partially reveal structural information and decrease the ability of the models to learn from the observed data. In this investigation, a dataset comprising 1261 PIs was assembled. A quantitative structure-property relationship targeting T was constructed using nine regression algorithms, with the Categorical Boosting demonstrating the highest accuracy, achieving a coefficient of determination of 0.895 for the test set. SHapley Additive exPlanations analysis identified the NumRotatableBonds descriptor had a significantly negative impact on T. Finally, all-atom molecular dynamics (MD) simulations calculated eight PI structures to verify the accuracy of the prediction model. The ML prediction was consistent with the MD simulation, with the lowest prediction deviation of approximately 6.75%, but the time and resource consumption were tremendously reduced. These findings emphasize the significance of utilizing extensive datasets for model training. This available and interpretable ML framework provides impressive acceleration over the MD simulation and serves as a reference for the structural design of PI with the desired T in the future.
机器学习(ML)的应用为发现具有特定性能的高性能材料以取代传统工程材料带来了更多机会。玻璃化转变温度(T)是聚酰亚胺(PI)的一个关键特性。但小数据集只能部分揭示结构信息,并降低模型从观测数据中学习的能力。在本研究中,组装了一个包含1261种聚酰亚胺的数据集。使用九种回归算法构建了针对T的定量结构-性质关系,其中分类提升算法显示出最高的准确性,测试集的决定系数达到0.895。SHapley加性解释分析确定可旋转键数量描述符对T有显著负面影响。最后,通过全原子分子动力学(MD)模拟计算了八种聚酰亚胺结构,以验证预测模型的准确性。ML预测与MD模拟结果一致,最低预测偏差约为6.75%,但时间和资源消耗大幅减少。这些发现强调了利用大量数据集进行模型训练的重要性。这个可用且可解释的ML框架比MD模拟提供了显著的加速,并为未来设计具有所需T的聚酰亚胺结构提供了参考。