Suppr超能文献

用于离子载体金属络合定量结构-性质关系研究的线性和非线性方法的基准测试。

Benchmarking of linear and nonlinear approaches for quantitative structure-property relationship studies of metal complexation with ionophores.

作者信息

Tetko Igor V, Solov'ev Vitaly P, Antonov Alexey V, Yao Xiaojun, Doucet Jean Pierre, Fan Botao, Hoonakker Frank, Fourches Denis, Jost Piere, Lachiche Nicolas, Varnek Alexandre

机构信息

Institute of Bioorganic & Petrochemistry, Kiev, Ukraine.

出版信息

J Chem Inf Model. 2006 Mar-Apr;46(2):808-19. doi: 10.1021/ci0504216.

Abstract

A benchmark of several popular methods, Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), Maximal Margin Linear Programming (MMLP), Radial Basis Function Neural Network (RBFNN), and Multiple Linear Regression (MLR), is reported for quantitative-structure property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and logbeta2 for 1:2 complexes of metal cations Ag+ and Eu3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors including E-state values, counts of atoms determined for E-state atom types, and substructural molecular fragments (SMF). Comparison of the models was performed using a 5-fold external cross-validation procedure. Robust statistical tests (bootstrap and Kolmogorov-Smirnov statistics) were employed to evaluate the significance of calculated models. The Wilcoxon signed-rank test was used to compare the performance of methods. Individual structure-complexation property models obtained with nonlinear methods demonstrated a significantly better performance than the models built using multilinear regression analysis (MLRA). However, the averaging of several MLRA models based on SMF descriptors provided as good of a prediction as the most efficient nonlinear techniques. Support Vector Machines and Associative Neural Networks contributed in the largest number of significant models. Models based on fragments (SMF descriptors and E-state counts) had higher prediction ability than those based on E-state indices. The use of SMF descriptors and E-state counts provided similar results, whereas E-state indices lead to less significant models. The current study illustrates the difficulties of quantitative comparison of different methods: conclusions based only on one data set without appropriate statistical tests could be wrong.

摘要

报告了几种常用方法的基准测试结果,这些方法包括关联神经网络(ANN)、支持向量机(SVM)、k近邻算法(kNN)、最大边缘线性规划(MMLP)、径向基函数神经网络(RBFNN)和多元线性回归(MLR),用于研究金属阳离子Ag+和Eu3+与多种有机分子在298K、离子强度为0.1M的水中形成的1:1(M:L)配合物的稳定常数logK1以及1:2配合物的logbeta2的定量结构-性质关系(QSPR)。这些方法在三种类型的描述符上进行了测试:分子描述符,包括E态值、根据E态原子类型确定的原子计数以及子结构分子片段(SMF)。使用5折外部交叉验证程序对模型进行比较。采用稳健的统计检验(自助法和柯尔莫哥洛夫-斯米尔诺夫统计量)来评估计算模型的显著性。使用威尔科克森符号秩检验来比较方法的性能。用非线性方法得到的个体结构-络合性质模型的性能明显优于使用多元线性回归分析(MLRA)构建的模型。然而,基于SMF描述符的多个MLRA模型的平均预测效果与最有效的非线性技术相当。支持向量机和关联神经网络在大量显著模型中贡献最大。基于片段(SMF描述符和E态计数)的模型比基于E态指数的模型具有更高的预测能力。使用SMF描述符和E态计数得到的结果相似,而E态指数导致的模型显著性较低。当前研究说明了不同方法进行定量比较的困难:仅基于一个数据集而没有适当统计检验得出的结论可能是错误的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验