• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

线性回归模型和定量构效关系中验证参数的样本量依赖性。

Sample-size dependence of validation parameters in linear regression models and in QSAR.

机构信息

Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary.

出版信息

SAR QSAR Environ Res. 2021 Apr;32(4):247-268. doi: 10.1080/1062936X.2021.1890208. Epub 2021 Mar 22.

DOI:10.1080/1062936X.2021.1890208
PMID:33749419
Abstract

The dependence of statistical validation parameters was investigated on the size of the sample taken in fit of multivariate linear curves. We observed that and related internal parameters were misleading as they overestimated the goodness-of-fit of models at small sample size. Cross-validation metrics showed correct trends. It was possible to scale the leave-one-out and the leave-many-out results close to identical by correcting the degrees of freedom of the models. and -randomized validation parameters were calculated and the methods provided close to identical results. We suggest to use the simplest methods in both cases. The external parameters followed correct trends with respect to the sample size, but their sensitivity differed. We plotted the Roy-Ojha metrics in 2D and we coloured them with respect to other external parameters to provide an easy classification of models. The rank correlations were calculated between the performance parameters. Up to a sample size, goodness-of-fit and robustness were distinguishable, but above a certain sample size, the parameters were redundant. The external-internal pairs were weakly correlated. Our data show that all the three aspects of validation are necessary at small sample sizes, but the internal check of robustness is not informative above a given sample size.

摘要

研究了在拟合多元线性曲线时,所取样本量对统计验证参数的依赖性。我们观察到, 和相关的内部参数具有误导性,因为它们在样本量较小时高估了模型的拟合优度。交叉验证指标显示出正确的趋势。通过修正模型的自由度,可以将留一法和留多法的结果接近一致地缩放。计算了 和 -随机验证参数,所提供的方法结果几乎相同。我们建议在这两种情况下都使用最简单的方法。外部参数与样本量之间存在正确的趋势,但它们的灵敏度不同。我们在二维空间中绘制了 Roy-Ojha 指标,并根据其他外部参数对其进行了着色,以便对模型进行简单的分类。计算了性能参数之间的等级相关系数。在一定的样本量范围内,拟合优度和稳健性是可区分的,但在某个样本量以上,参数是冗余的。外部-内部对之间的相关性较弱。我们的数据表明,在小样本量下,验证的所有三个方面都是必要的,但在给定的样本量以上,稳健性的内部检查就没有信息了。

相似文献

1
Sample-size dependence of validation parameters in linear regression models and in QSAR.线性回归模型和定量构效关系中验证参数的样本量依赖性。
SAR QSAR Environ Res. 2021 Apr;32(4):247-268. doi: 10.1080/1062936X.2021.1890208. Epub 2021 Mar 22.
2
The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type.《关于样本量和模型类型,OECD-QSAR 原则中拟合优度、稳健性和预测验证类别与》
Mol Inform. 2022 Nov;41(11):e2200072. doi: 10.1002/minf.202200072. Epub 2022 Jul 25.
3
The quality of QSAR models: problems and solutions.定量构效关系模型的质量:问题与解决方案。
SAR QSAR Environ Res. 2007 Jan-Mar;18(1-2):89-100. doi: 10.1080/10629360601053984.
4
On the use of the metric rm² as an effective tool for validation of QSAR models in computational drug design and predictive toxicology.关于在计算药物设计和预测毒理学中使用度量 rm² 作为 QSAR 模型验证的有效工具。
Mini Rev Med Chem. 2012 Jun;12(6):491-504. doi: 10.2174/138955712800493861.
5
Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling.谨防外部验证!——QSAR建模中几种验证技术的比较研究。
Curr Comput Aided Drug Des. 2018;14(4):284-291. doi: 10.2174/1573409914666180426144304.
6
Development of linear and nonlinear predictive QSAR models and their external validation using molecular similarity principle for anti-HIV indolyl aryl sulfones.基于分子相似性原理的抗HIV吲哚基芳基砜类线性和非线性预测QSAR模型的构建及其外部验证
J Enzyme Inhib Med Chem. 2008 Dec;23(6):980-95. doi: 10.1080/14756360701811379.
7
Development of classification model and QSAR model for predicting binding affinity of endocrine disrupting chemicals to human sex hormone-binding globulin.用于预测内分泌干扰化学物质与人类性激素结合球蛋白结合亲和力的分类模型和定量构效关系模型的开发。
Chemosphere. 2016 Aug;156:1-7. doi: 10.1016/j.chemosphere.2016.04.077. Epub 2016 May 6.
8
Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools.使用化学计量学工具对CCR5拮抗剂哌啶衍生物进行预测性QSAR建模。
J Enzyme Inhib Med Chem. 2009 Feb;24(1):205-23. doi: 10.1080/14756360802051297.
9
Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques.使用逐步多元线性回归(MLR)、因子分析多元线性回归(FA-MLR)、偏最小二乘法(PLS)、广义因子分析(GFA)、广义偏最小二乘法(G/PLS)和人工神经网络(ANN)技术对结构多样的化合物的细胞色素3A4抑制活性进行比较化学计量学建模。
Eur J Med Chem. 2009 Jul;44(7):2913-22. doi: 10.1016/j.ejmech.2008.12.004. Epub 2008 Dec 16.
10
The rm2 metrics and regression through origin approach: reliable and useful validation tools for predictive QSAR models (Commentary on 'Is regression through origin useful in external validation of QSAR models?').rm2指标与过原点回归方法:预测性QSAR模型的可靠且有用的验证工具(关于“过原点回归在QSAR模型外部验证中有用吗?”的评论)
Eur J Pharm Sci. 2014 Oct 1;62:111-4. doi: 10.1016/j.ejps.2014.05.019. Epub 2014 May 29.

引用本文的文献

1
Repeated time-series cross-validation: A new method to improved COVID-19 forecast accuracy in Malaysia.重复时间序列交叉验证:一种提高马来西亚新冠疫情预测准确性的新方法。
MethodsX. 2024 Oct 30;13:103013. doi: 10.1016/j.mex.2024.103013. eCollection 2024 Dec.
2
Computational modeling of PET imaging agents for vesicular acetylcholine transporter (VAChT) protein binding affinity: application of 2D-QSAR modeling and molecular docking techniques.用于囊泡乙酰胆碱转运体(VAChT)蛋白结合亲和力的正电子发射断层扫描(PET)成像剂的计算建模:二维定量构效关系(2D-QSAR)建模和分子对接技术的应用
In Silico Pharmacol. 2023 Apr 4;11(1):9. doi: 10.1007/s40203-023-00146-4. eCollection 2023.
3
The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type.
《关于样本量和模型类型,OECD-QSAR 原则中拟合优度、稳健性和预测验证类别与》
Mol Inform. 2022 Nov;41(11):e2200072. doi: 10.1002/minf.202200072. Epub 2022 Jul 25.
4
Response surface approach to optimize temperature, pH and time on antioxidant properties of wild bush honey from high altitude region (Kashmir Valley) of India.采用响应面法优化印度高海拔地区(克什米尔山谷)野生灌木蜂蜜抗氧化性能的温度、pH值和时间。
Saudi J Biol Sci. 2022 Feb;29(2):767-773. doi: 10.1016/j.sjbs.2021.10.049. Epub 2021 Oct 25.