Yarahmadi Bita, Hashemianzadeh Seyed Majid, Milani Hosseini Seyed Mohammad-Reza
Real Samples Analysis Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran.
Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran.
Sci Rep. 2023 Jul 26;13(1):12111. doi: 10.1038/s41598-023-39374-1.
The molecularly imprinted polymers are artificial polymers that, during the synthesis, create specific sites for a definite purpose. These polymers due to their characteristics such as stability, easy of synthesis, reproducibility, reusability, high accuracy, and selectivity have many applications. However, the variety of the functional monomers, templates, solvents, and synthesis conditions like pH, temperature, the rate of stirring, and time, limit the selectivity of imprinting. The Practical optimization of the synthetic conditions has many drawbacks, including chemical compound usage, equipment requirements, and time costs. The use of machine learning (ML) for the prediction of the imprinting factor (IF), which indicates the quality of imprinting is a very interesting idea to overcome these problems. The ML has many advantages, for example a lack of human error, high accuracy, high repeatability, and prediction of a large amount of data in the minimum time. In this research, ML was used to predict the IF using non-linear regression algorithms, including classification and regression tree, support vector regression, and k-nearest neighbors, and ensemble algorithms, like gradient boosting (GB), random forest, and extra trees. The data sets were obtained practically in the laboratory, and inputs, included pH, the type of the template, the type of the monomer, solvent, the distribution coefficient of the MIP (K), and the distribution coefficient of the non-imprinted polymer (K). The mutual information feature selection method was used to select the important features affecting the IF. The results showed that the GB algorithm had the best performance in predicting the IF, and using this algorithm, the maximum R value (R = 0.871), and the minimum mean absolute error (MAE = - 0.982), and mean square error were obtained (MSE = - 2.303).
分子印迹聚合物是一种人工合成的聚合物,在合成过程中会出于特定目的创建特定位点。这些聚合物因其稳定性、易于合成、可重复性、可重复使用性、高精度和高选择性等特性而有许多应用。然而,功能单体、模板、溶剂以及合成条件(如pH值、温度、搅拌速率和时间)的多样性限制了印迹的选择性。合成条件的实际优化存在许多缺点,包括化合物使用、设备要求和时间成本。使用机器学习(ML)来预测印迹因子(IF)是克服这些问题的一个非常有趣的想法,印迹因子表明了印迹的质量。ML有许多优点,例如不存在人为误差、高精度、高重复性以及能在最短时间内预测大量数据。在本研究中,使用ML通过非线性回归算法(包括分类与回归树、支持向量回归和k近邻算法)以及集成算法(如梯度提升(GB)、随机森林和极端随机树)来预测IF。数据集是在实验室实际获得的,输入参数包括pH值、模板类型、单体类型、溶剂、分子印迹聚合物的分配系数(K)和非印迹聚合物的分配系数(K)。采用互信息特征选择方法来选择影响IF的重要特征。结果表明,GB算法在预测IF方面具有最佳性能,使用该算法可获得最大R值(R = 0.871)、最小平均绝对误差(MAE = -0.982)和均方误差(MSE = -2.303)。