Suppr超能文献

基于机器学习的苯并咪唑衍生物作为缓蚀剂的 QSAR 模型,综合特征选择。

A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection.

机构信息

Research Institute of Natural Gas Technology, Petro China Southwest Oil and Gas Field Company, Chengdu, 610213, China.

College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, People's Republic of China.

出版信息

Interdiscip Sci. 2019 Dec;11(4):738-747. doi: 10.1007/s12539-019-00346-7. Epub 2019 Sep 4.

Abstract

BACKGROUND

Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces.

PURPOSE

Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives.

METHODS

First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors.

RESULTS

Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data.

CONCLUSION

Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.

摘要

背景

计算抑制剂分子抑制效率(IE)的预测是设计能够有效抑制金属表面腐蚀的新型分子的重要补充方法。

目的

本文致力于开发一种基于机器学习的苯并咪唑衍生物抑制效率(IE)的新预测器。

方法

首先,基于 3D 结构和 150 个有效结构描述符,从能量、电子、拓扑、物理化学和空间性质等方面全面数值表示抑制剂分子。然后,对这些结构描述符进行了深入研究。基于多重共线性的聚类分析用于去除线性相关的特征变量,从而产生了 47 个特征簇。同时,使用随机森林(RF)的基尼重要性进一步衡量每个簇中描述符的贡献,并选择相应簇中基尼重要性得分最高的 47 个非线性描述符。此外,考虑到可用抑制剂数量有限,根据 47 个描述符的基尼重要性得分排序表构建了不同的特征子集。

结果

最后,通过留一交叉验证测试了基于不同特征子集的支持向量机(SVM)模型。通过比较,基于 Poly 核的前 11 个描述符的最优 SVM 模型取得了较好的性能,相关系数(R)和均方根误差(RMSE)分别为 0.9589 和 4.45,表明我们提出的方法对当前数据具有最佳性能。

结论

基于我们的模型,设计了 6 种新的苯并咪唑分子,并通过该模型预测了它们的 IE 值,其中两种具有作为优秀腐蚀抑制剂的高潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验