Fatriansyah Jaka Fajar, Linuwih Baiq Diffa Pakarti, Andreano Yossi, Sari Intan Septia, Federico Andreas, Anis Muhammad, Surip Siti Norasmah, Jaafar Mariatti
Department of Metallurgical and Materials Engineering, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia.
Advanced Functional Material Research Group, Faculty of Engineering, Universitas Indonesia, Kampus UI Depok, Depok 16424, Indonesia.
Polymers (Basel). 2024 Aug 29;16(17):2464. doi: 10.3390/polym16172464.
Polymer materials have garnered significant attention due to their exceptional mechanical properties and diverse industrial applications. Understanding the glass transition temperature () of polymers is critical to prevent operational failures at specific temperatures. Traditional methods for measuring , such as differential scanning calorimetry (DSC) and dynamic mechanical analysis, while accurate, are often time-consuming, costly, and susceptible to inaccuracies due to random and uncertain factors. To address these limitations, the aim of the present study is to investigate the potential of Simplified Molecular Input Line Entry System (SMILES) as descriptors in simple machine learning models to predict efficiently and reliably. Five models were utilized: k-nearest neighbors (KNNs), support vector regression (SVR), extreme gradient boosting (XGBoost), artificial neural network (ANN), and recurrent neural network (RNN). SMILES descriptors were converted into numerical data using either One Hot Encoding (OHE) or Natural Language Processing (NLP). The study found that SMILES inputs with fewer than 200 characters were inadequate for accurately describing compound structures, while inputs exceeding 200 characters diminished model performance due to the curse of dimensionality. The ANN model achieved the highest R value of 0.79; however, the XGB model, with an R value of 0.774, exhibited the highest stability and shorter training times compared to other models, making it the preferred choice for prediction. The efficiency of the OHE method over NLP was demonstrated by faster training times across the KNN, SVR, XGB, and ANN models. Validation of new polymer data showed the XGB model's robustness, with an average prediction deviation of 9.76 from actual values. These findings underscore the importance of optimizing SMILES conversion methods and model parameters to enhance prediction reliability. Future research should focus on improving model accuracy and generalizability by incorporating additional features and advanced techniques. This study contributes to the development of efficient and reliable predictive models for polymer properties, facilitating the design and application of new polymer materials.
聚合物材料因其卓越的机械性能和多样的工业应用而备受关注。了解聚合物的玻璃化转变温度( )对于防止在特定温度下出现操作故障至关重要。传统的测量 的方法,如差示扫描量热法(DSC)和动态力学分析,虽然准确,但往往耗时、成本高,并且由于随机和不确定因素容易出现不准确的情况。为了解决这些局限性,本研究的目的是探讨简化分子输入线输入系统(SMILES)作为简单机器学习模型中的描述符,以高效、可靠地预测 。使用了五种模型:k近邻(KNN)、支持向量回归(SVR)、极端梯度提升(XGBoost)、人工神经网络(ANN)和循环神经网络(RNN)。SMILES描述符使用独热编码(OHE)或自然语言处理(NLP)转换为数值数据。研究发现,少于200个字符的SMILES输入不足以准确描述化合物结构,而超过200个字符的输入由于维度诅咒会降低模型性能。ANN模型的最高R值为0.79;然而,XGB模型的R值为0.774,与其他模型相比,具有最高的稳定性和更短的训练时间,使其成为 预测的首选。KNN、SVR、XGB和ANN模型的更快训练时间证明了OHE方法比NLP更有效。新聚合物数据的验证表明XGB模型的稳健性,与实际 值的平均预测偏差为9.76。这些发现强调了优化SMILES转换方法和模型参数以提高预测可靠性的重要性。未来的研究应专注于通过纳入额外特征和先进技术来提高模型的准确性和通用性。本研究有助于开发高效、可靠的聚合物性能预测模型,促进新型聚合物材料的设计和应用。