Department of Chemical Engineering, Sichuan University, Chengdu, PR China.
SAR QSAR Environ Res. 2021 May;32(5):379-393. doi: 10.1080/1062936X.2021.1902387. Epub 2021 Apr 7.
Linear and nonlinear quantitative structure-property relationship (QSPR) models were developed based on a dataset with 65 polymer-solvent combinations. Seven quantum chemical descriptors, dipole moment, hardness, chemical potential, electrophilicity index, total energy, HOMO and LUMO orbital energies, were calculated with density functional theory at the B3LYP/6-31 G(d) level for polymers and solvents. Considering the strong correlation between intrinsic viscosity and weight, size, shape as well as topological structure of polymers and solvents, topological descriptors were also applied in this work. Meanwhile, the most appropriate polymer structure representation was investigated by considering 1-5 monomeric repeating units. The molecular descriptors were first screened by using the genetic algorithms-multiple linear regression (GA-MLR), with coefficient of determinations () of 0.78 and 0.83 for the training set and the prediction set, respectively. The support vector machine model (SVM) model based on the selected descriptors subset showed a value of 0.95 for the training set and 0.93 for the prediction set. All statistical results suggest that the established QSPR models have good predictability. Furthermore, a new test set obtained from the literature was used for further validation. The values were 0.81 for the MLR model and 0.90 for the SVM model.
建立了基于包含 65 种聚合物-溶剂组合的数据集的线性和非线性定量构效关系 (QSPR) 模型。使用密度泛函理论在 B3LYP/6-31G(d)水平上为聚合物和溶剂计算了七个量子化学描述符,即偶极矩、硬度、化学势、电负性指数、总能量、HOMO 和 LUMO 轨道能量。考虑到聚合物和溶剂的内禀粘度与其重量、大小、形状以及拓扑结构之间存在很强的相关性,本工作还应用了拓扑描述符。同时,通过考虑 1-5 个单体重复单元,研究了最合适的聚合物结构表示。首先使用遗传算法-多元线性回归 (GA-MLR) 对分子描述符进行筛选,训练集和预测集的相关系数 () 分别为 0.78 和 0.83。基于所选描述符子集的支持向量机模型 (SVM) 显示训练集的 值为 0.95,预测集的值为 0.93。所有统计结果表明,所建立的 QSPR 模型具有良好的预测能力。此外,还使用来自文献的新测试集进行了进一步验证。MLR 模型和 SVM 模型的 值分别为 0.81 和 0.90。