Uddin Md Jamal, Fan Jitang
School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China.
Polymers (Basel). 2024 Apr 10;16(8):1049. doi: 10.3390/polym16081049.
The glass transition temperature of polymers is a key parameter in meeting the application requirements for energy absorption. Previous studies have provided some data from slow, expensive trial-and-error procedures. By recognizing these data, machine learning algorithms are able to extract valuable knowledge and disclose essential insights. In this study, a dataset of 7174 samples was utilized. The polymers were numerically represented using two methods: Morgan fingerprint and molecular descriptor. During preprocessing, the dataset was scaled using a standard scaler technique. We removed the features with small variance from the dataset and used the Pearson correlation technique to exclude the features that were highly connected. Then, the most significant features were selected using the recursive feature elimination method. Nine machine learning techniques were employed to predict the glass transition temperature and tune their hyperparameters. The models were compared using the performance metrics of mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R). We observed that the extra tree regressor provided the best results. Significant features were also identified using statistical machine learning methods. The SHAP method was also employed to demonstrate the influence of each feature on the model's output. This framework can be adaptable to other properties at a low computational expense.
聚合物的玻璃化转变温度是满足能量吸收应用要求的关键参数。以往的研究通过缓慢、昂贵的试错程序提供了一些数据。通过识别这些数据,机器学习算法能够提取有价值的知识并揭示重要的见解。在本研究中,使用了一个包含7174个样本的数据集。聚合物通过两种方法进行数值表示:摩根指纹和分子描述符。在预处理过程中,使用标准缩放器技术对数据集进行缩放。我们从数据集中删除了方差较小的特征,并使用皮尔逊相关技术排除了高度相关的特征。然后,使用递归特征消除方法选择最重要的特征。采用九种机器学习技术来预测玻璃化转变温度并调整其超参数。使用平均绝对误差(MAE)、均方根误差(RMSE)和决定系数(R)等性能指标对模型进行比较。我们观察到,极端随机树回归器提供了最佳结果。还使用统计机器学习方法识别了重要特征。还采用SHAP方法来证明每个特征对模型输出的影响。该框架可以以较低的计算成本适应其他属性。