Suppr超能文献

《关于样本量和模型类型,OECD-QSAR 原则中拟合优度、稳健性和预测验证类别与》

The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type.

机构信息

Institute of Chemistry, Loránd Eötvös University, Pázmány S.1/A, 1117, Budapest, Hungary.

出版信息

Mol Inform. 2022 Nov;41(11):e2200072. doi: 10.1002/minf.202200072. Epub 2022 Jul 25.

Abstract

We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co-operation and Development. We checked the goodness-of-fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness-of-fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness-of-fit parameters often might be questioned. We propose to use the simplest y-scrambling method to estimate chance correlation. We found that the leave-one-out and leave-many-out cross-validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations.

摘要

我们研究了经济合作与发展组织发布的定量构效关系模型验证原则的相关性。我们使用基准数据集检查了线性和非线性模型的拟合优度、稳健性和预测性类别。我们的大多数结论都是使用不同验证参数的样本大小依赖性得出的。我们发现,拟合优度参数在小样本上会对模型产生误导性的高估。在神经网络和支持向量机模型的情况下,拟合优度参数的可行性常常可能受到质疑。我们建议使用最简单的 y 混淆方法来估计偶然相关性。我们发现,在所有模型中,留一法和留多法交叉验证参数可以相互缩放,并且应该根据模型类型选择计算上可行的方法。我们通过计算验证参数的秩相关来评估它们的相互依赖性。对于线性模型,拟合优度和稳健性在样本大小上相关性很好,其中一种方法可能是多余的。在内部和外部验证参数的秩相关中,我们发现将良好和不良可建模数据分配给训练或测试会导致负相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b447/9787734/ffdee17593e2/MINF-41-2200072-g003.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验