• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机森林方法在定量构效关系模型中用于领域适用性的三个有用维度。

Three useful dimensions for domain applicability in QSAR models using random forest.

机构信息

Chemistry Modeling and Informatics, Merck Research Laboratories, Rahway, New Jersey 07065, USA.

出版信息

J Chem Inf Model. 2012 Mar 26;52(3):814-23. doi: 10.1021/ci300004n. Epub 2012 Mar 9.

DOI:10.1021/ci300004n
PMID:22385389
Abstract

One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.

摘要

一种用于评估定量构效关系(QSAR)预测准确性的常用指标是基于被预测化合物与构建 QSAR 模型的训练集中化合物的相似性。该领域的最新研究表明,其他参数可能与相似性同等重要或更为重要。在这里,我们利用了另外两个参数:随机森林树之间预测的变化(树之间的变化越小,预测越准确)和预测本身(某些活性范围比其他范围更容易预测)。通过在构建模型时对训练集进行交叉验证,可以估计 QSAR 模型的预测准确性,预测准确性由均方根误差来衡量,并作为三维箱的数组存储。这是我们之前提出的用于与训练集相似性的一维箱数组的明显扩展[Sheridan 等人。J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]。我们表明,同时使用这三个参数可以比任何单个参数更能提高预测准确性的区分度。这种方法可应用于产生模型集合的任何 QSAR 方法。我们还表明,模型构建后测试化合物的均方根误差可以预测交叉验证产生的均方根误差。

相似文献

1
Three useful dimensions for domain applicability in QSAR models using random forest.随机森林方法在定量构效关系模型中用于领域适用性的三个有用维度。
J Chem Inf Model. 2012 Mar 26;52(3):814-23. doi: 10.1021/ci300004n. Epub 2012 Mar 9.
2
The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity.域适用性指标在估计 QSAR 预测误差方面的相对重要性随训练集多样性而变化。
J Chem Inf Model. 2015 Jun 22;55(6):1098-107. doi: 10.1021/acs.jcim.5b00110. Epub 2015 Jun 4.
3
Using random forest to model the domain applicability of another random forest model.使用随机森林模型来模拟另一个随机森林模型的领域适用性。
J Chem Inf Model. 2013 Nov 25;53(11):2837-50. doi: 10.1021/ci400482e. Epub 2013 Nov 5.
4
Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.针对梨形四膜虫的环境毒性定量构效关系(QSAR)模型的批判性评估:聚焦适用域及变量选择导致的过拟合问题
J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.
5
Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome.激酶-核模型:对整个人类激酶组中的 400 万种化合物进行准确的计算机筛选。
J Chem Inf Model. 2012 Jan 23;52(1):156-70. doi: 10.1021/ci200314j. Epub 2012 Jan 6.
6
Assessing the reliability of a QSAR model's predictions.评估定量构效关系(QSAR)模型预测的可靠性。
J Mol Graph Model. 2005 Jun;23(6):503-23. doi: 10.1016/j.jmgm.2005.03.003.
7
Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds.抗菌剂的统一定量构效关系方法。第3部分:用于抗原生动物化合物的输入编码预测、结构反向投影和复杂网络聚类的首个多任务定量构效关系模型。
Bioorg Med Chem. 2008 Jun 1;16(11):5871-80. doi: 10.1016/j.bmc.2008.04.068. Epub 2008 Apr 29.
8
Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR.与训练集中分子的相似性是定量构效关系中预测准确性的良好判别指标。
J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):1912-28. doi: 10.1021/ci049782w.
9
Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set.分类问题的适用域:Ames 致突变性集模型距离的基准测试。
J Chem Inf Model. 2010 Dec 27;50(12):2094-111. doi: 10.1021/ci100253r. Epub 2010 Oct 29.
10
Comments on the definition of the Q2 parameter for QSAR validation.关于用于定量构效关系(QSAR)验证的Q2参数定义的评论。
J Chem Inf Model. 2009 Jul;49(7):1669-78. doi: 10.1021/ci900115y.

引用本文的文献

1
Applicability Domain for Trustable Predictions.可信赖预测的适用域。
Methods Mol Biol. 2025;2834:131-149. doi: 10.1007/978-1-0716-4003-6_6.
2
Machine learning driven web-based app platform for the discovery of monoamine oxidase B inhibitors.基于机器学习的网页应用程序平台,用于发现单胺氧化酶 B 抑制剂。
Sci Rep. 2024 Feb 28;14(1):4868. doi: 10.1038/s41598-024-55628-y.
3
Rethinking the applicability domain analysis in QSAR models.重新思考 QSAR 模型中的适用性域分析。
J Comput Aided Mol Des. 2024 Feb 14;38(1):9. doi: 10.1007/s10822-024-00550-8.
4
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。
J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.
5
Uncertainty quantification: Can we trust artificial intelligence in drug discovery?不确定性量化:在药物研发中我们能信任人工智能吗?
iScience. 2022 Jul 21;25(8):104814. doi: 10.1016/j.isci.2022.104814. eCollection 2022 Aug 19.
6
A novel artificial intelligence protocol to investigate potential leads for Parkinson's disease.一种用于研究帕金森病潜在线索的新型人工智能协议。
RSC Adv. 2020 Jun 16;10(39):22939-22958. doi: 10.1039/d0ra04028b.
7
Introduction to the BioChemical Library (BCL): An Application-Based Open-Source Toolkit for Integrated Cheminformatics and Machine Learning in Computer-Aided Drug Discovery.生物化学库(BCL)简介:一种基于应用的开源工具包,用于计算机辅助药物发现中的综合化学信息学和机器学习。
Front Pharmacol. 2022 Feb 21;13:833099. doi: 10.3389/fphar.2022.833099. eCollection 2022.
8
The use of Bayesian methodology in the development and validation of a tiered assessment approach towards prediction of rat acute oral toxicity.贝叶斯方法在分层评估方法的开发和验证中的应用,以预测大鼠急性口服毒性。
Arch Toxicol. 2022 Mar;96(3):817-830. doi: 10.1007/s00204-021-03205-x. Epub 2022 Jan 16.
9
Semi-automated workflow for molecular pair analysis and QSAR-assisted transformation space expansion.用于分子对分析和QSAR辅助转化空间扩展的半自动工作流程。
J Cheminform. 2021 Nov 13;13(1):86. doi: 10.1186/s13321-021-00564-6.
10
A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.一种用于改进基于深度学习的定量构效关系回归建模中不确定性量化的混合框架。
J Cheminform. 2021 Sep 20;13(1):69. doi: 10.1186/s13321-021-00551-x.