• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为开发经过验证的定量构效关系(QSAR)模型合理选择训练集和测试集。

Rational selection of training and test sets for the development of validated QSAR models.

作者信息

Golbraikh Alexander, Shen Min, Xiao Zhiyan, Xiao Yun-De, Lee Kuo-Hsiung, Tropsha Alexander

机构信息

Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7360, USA.

出版信息

J Comput Aided Mol Des. 2003 Feb-Apr;17(2-4):241-53. doi: 10.1023/a:1025386326946.

DOI:10.1023/a:1025386326946
PMID:13677490
Abstract

Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

摘要

定量构效关系(QSAR)模型越来越多地用于筛选化学数据库和/或虚拟化学库以寻找潜在的生物活性分子。这些进展强调了严格模型验证的重要性,以确保模型具有可接受的预测能力。我们最近使用k最近邻(kNN)变量选择QSAR方法分析了几个数据集,结果表明广泛接受的留一法(LOO)交叉验证的R2(q2)并不是评估模型预测能力的充分特征[戈尔布赖赫,A.,特罗普沙,A. 谨防q2!《分子图形与建模杂志》20,269 - 276,(2002)]。在此,我们提供了额外的证据,即训练集的q2值与测试集的预测准确性(R2)之间不存在相关性,并认为这一观察结果是任何采用留一法交叉验证开发的QSAR模型的普遍特性。我们建议使用合理选择的训练集和测试集进行外部验证,这是建立可靠QSAR模型的一种方法。我们提出了几种将实验数据集划分为训练集和测试集的方法,并将其应用于48种功能化氨基酸抗惊厥药和一系列157种具有抗肿瘤活性的表鬼臼毒素衍生物的QSAR研究中。我们制定了一套评估QSAR模型预测能力的通用标准。

相似文献

1
Rational selection of training and test sets for the development of validated QSAR models.为开发经过验证的定量构效关系(QSAR)模型合理选择训练集和测试集。
J Comput Aided Mol Des. 2003 Feb-Apr;17(2-4):241-53. doi: 10.1023/a:1025386326946.
2
Beware of q2!小心q2!
J Mol Graph Model. 2002 Jan;20(4):269-76. doi: 10.1016/s1093-3263(01)00123-1.
3
Antitumor agents. 213. Modeling of epipodophyllotoxin derivatives using variable selection k nearest neighbor QSAR method.抗肿瘤剂。213. 使用变量选择k最近邻QSAR方法对表鬼臼毒素衍生物进行建模。
J Med Chem. 2002 May 23;45(11):2294-309. doi: 10.1021/jm0105427.
4
Combinatorial QSAR of ambergris fragrance compounds.龙涎香香料化合物的组合定量构效关系
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):582-95. doi: 10.1021/ci034203t.
5
Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds.预测性定量构效关系模型在数据库挖掘中的应用:新型抗惊厥化合物的鉴定与实验验证
J Med Chem. 2004 Apr 22;47(9):2356-64. doi: 10.1021/jm030584q.
6
Does rational selection of training and test sets improve the outcome of QSAR modeling?训练集和测试集的合理选择是否能提高 QSAR 建模的结果?
J Chem Inf Model. 2012 Oct 22;52(10):2570-8. doi: 10.1021/ci300338w. Epub 2012 Oct 3.
7
Novel inhibitors of human histone deacetylase (HDAC) identified by QSAR modeling of known inhibitors, virtual screening, and experimental validation.通过对已知抑制剂进行定量构效关系建模、虚拟筛选和实验验证鉴定出的新型人类组蛋白去乙酰化酶(HDAC)抑制剂。
J Chem Inf Model. 2009 Feb;49(2):461-76. doi: 10.1021/ci800366f.
8
Application of validated QSAR models of D1 dopaminergic antagonists for database mining.经验证的D1多巴胺能拮抗剂定量构效关系模型在数据库挖掘中的应用。
J Med Chem. 2005 Nov 17;48(23):7322-32. doi: 10.1021/jm049116m.
9
Modeling of p38 mitogen-activated protein kinase inhibitors using the Catalyst HypoGen and k-nearest neighbor QSAR methods.使用Catalyst HypoGen和k近邻QSAR方法对p38丝裂原活化蛋白激酶抑制剂进行建模。
J Mol Graph Model. 2004 Oct;23(2):129-38. doi: 10.1016/j.jmgm.2004.05.001.
10
Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.基于实验数据集的多样性采样进行训练集和测试集选择的预测性定量构效关系建模。
J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69. doi: 10.1023/a:1020869118689.

引用本文的文献

1
design of novel dihydropteridone derivatives with oxadiazoles as potent inhibitors of MCF-7 breast cancer cells.以恶二唑为有效MCF-7乳腺癌细胞抑制剂的新型二氢蝶啶酮衍生物的设计
Front Chem. 2025 Jul 28;13:1590593. doi: 10.3389/fchem.2025.1590593. eCollection 2025.
2
Design and screening of novel 1,2,4-Triazole-3-thione derivatives as DCN1 inhibitors for anticardiac fibrosis based on 3D-QSAR modeling and molecular dynamics.基于三维定量构效关系建模和分子动力学的新型1,2,4-三唑-3-硫酮衍生物作为DCN1抑制剂用于抗心脏纤维化的设计与筛选
Front Pharmacol. 2025 Jun 27;16:1590711. doi: 10.3389/fphar.2025.1590711. eCollection 2025.
3

本文引用的文献

1
Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.基于实验数据集的多样性采样进行训练集和测试集选择的预测性定量构效关系建模。
J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69. doi: 10.1023/a:1020869118689.
2
Quantitative structure-activity relationship analysis of functionalized amino acid anticonvulsant agents using k nearest neighbor and simulated annealing PLS methods.使用k最近邻和模拟退火偏最小二乘法对功能化氨基酸抗惊厥剂进行定量构效关系分析。
J Med Chem. 2002 Jun 20;45(13):2811-23. doi: 10.1021/jm010488u.
3
Antitumor agents. 213. Modeling of epipodophyllotoxin derivatives using variable selection k nearest neighbor QSAR method.
Assessment of the rat acute oral toxicity of quinoline-based pharmaceutical scaffold molecules using QSTR, q-RASTR and machine learning methods.
使用定量结构-活性关系(QSTR)、定量响应-活性关系(q-RASTR)和机器学习方法评估喹啉类药物支架分子的大鼠急性经口毒性。
Mol Divers. 2025 Jun 27. doi: 10.1007/s11030-025-11265-9.
4
Development and Application of a Senolytic Predictor for Discovery of Novel Senolytic Compounds and Herbs.用于发现新型衰老细胞裂解化合物和草药的衰老细胞裂解预测器的开发与应用
Molecules. 2025 Jun 19;30(12):2653. doi: 10.3390/molecules30122653.
5
QSPR analysis of physico-chemical and pharmacological properties of medications for Parkinson's treatment utilizing neighborhood degree-based topological descriptors.利用基于邻域度的拓扑描述符对帕金森病治疗药物的物理化学和药理性质进行定量构效关系分析。
Sci Rep. 2025 May 15;15(1):16941. doi: 10.1038/s41598-025-00898-3.
6
Unraveling potent Glycyrrhiza glabra flavonoids as AKT1 inhibitors using network pharmacology and machine learning-assisted QSAR.利用网络药理学和机器学习辅助的定量构效关系解析光果甘草中有效的黄酮类化合物作为AKT1抑制剂的作用机制
Mol Divers. 2025 May 8. doi: 10.1007/s11030-025-11210-w.
7
In silico design of novel pyridazine derivatives as balanced multifunctional agents against Alzheimer's disease.新型哒嗪衍生物作为抗阿尔茨海默病平衡多功能药物的计算机辅助设计
Sci Rep. 2025 May 7;15(1):15910. doi: 10.1038/s41598-025-98182-x.
8
New Amidino-Substituted Benzimidazole Derivatives as Human Dipeptidyl Peptidase III Inhibitors: Synthesis, In Vitro Evaluation, QSAR, and Molecular Docking Studies.新型脒基取代苯并咪唑衍生物作为人二肽基肽酶III抑制剂:合成、体外评价、定量构效关系及分子对接研究
Int J Mol Sci. 2025 Apr 20;26(8):3899. doi: 10.3390/ijms26083899.
9
Molecular Docking Study and 3D-QSAR Model for Trans-Stilbene Derivatives as Ligands of CYP1B1.反式芪衍生物作为CYP1B1配体的分子对接研究及3D-QSAR模型
Int J Mol Sci. 2025 Jan 24;26(3):1002. doi: 10.3390/ijms26031002.
10
Discovery of novel AR antagonist via 3D-QSAR pharmacophore modeling: neuroprotective effects in 6-OHDA-induced SH-SY5Y cells and haloperidol-induced Parkinsonism in C57 bl/6 mice.通过3D-QSAR药效团模型发现新型雄激素受体拮抗剂:对6-羟基多巴胺诱导的SH-SY5Y细胞的神经保护作用以及对C57BL/6小鼠氟哌啶醇诱导的帕金森病的影响
Mol Divers. 2025 Feb 3. doi: 10.1007/s11030-025-11120-x.
抗肿瘤剂。213. 使用变量选择k最近邻QSAR方法对表鬼臼毒素衍生物进行建模。
J Med Chem. 2002 May 23;45(11):2294-309. doi: 10.1021/jm0105427.
4
Beware of q2!小心q2!
J Mol Graph Model. 2002 Jan;20(4):269-76. doi: 10.1016/s1093-3263(01)00123-1.
5
Quantitative structure-antitumor activity relationships of camptothecin analogues: cluster analysis and genetic algorithm-based studies.喜树碱类似物的定量构效关系:聚类分析和基于遗传算法的研究
J Med Chem. 2001 Sep 27;44(20):3254-63. doi: 10.1021/jm0005151.
6
Classification of environmental estrogens by physicochemical properties using principal component analysis and hierarchical cluster analysis.利用主成分分析和层次聚类分析按理化性质对环境雌激素进行分类
J Chem Inf Comput Sci. 2001 May-Jun;41(3):718-26. doi: 10.1021/ci000333f.
7
Identification of the descriptor pharmacophores using variable selection QSAR: applications to database mining.使用变量选择定量构效关系鉴定描述符药效团:在数据库挖掘中的应用
Curr Pharm Des. 2001 May;7(7):599-612. doi: 10.2174/1381612013397834.
8
Modeling antimalarial activity: application of Kinetic Energy Density Quantum Similarity Measures as descriptors in QSAR.抗疟活性建模:动能密度量子相似性度量作为定量构效关系描述符的应用。
J Chem Inf Comput Sci. 2000 Nov-Dec;40(6):1400-7. doi: 10.1021/ci0004558.
9
Construction of high-quality structure-property-activity regressions: the boiling points of sulfides.高质量结构-性质-活性回归的构建:硫化物的沸点
J Chem Inf Comput Sci. 2000 Jul;40(4):899-905. doi: 10.1021/ci990115q.
10
SAR of 9-amino-1,2,3,4-tetrahydroacridine-based acetylcholinesterase inhibitors: synthesis, enzyme inhibitory activity, QSAR, and structure-based CoMFA of tacrine analogues.基于9-氨基-1,2,3,4-四氢吖啶的乙酰胆碱酯酶抑制剂的构效关系:他克林类似物的合成、酶抑制活性、定量构效关系及基于结构的比较分子场分析
J Med Chem. 2000 May 18;43(10):2007-18. doi: 10.1021/jm990971t.