• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用随机森林模型来模拟另一个随机森林模型的领域适用性。

Using random forest to model the domain applicability of another random forest model.

机构信息

Cheminformatics Department, Merck Research Laboratories , RY800-D133, Rahway, New Jersey 07065, United States.

出版信息

J Chem Inf Model. 2013 Nov 25;53(11):2837-50. doi: 10.1021/ci400482e. Epub 2013 Nov 5.

DOI:10.1021/ci400482e
PMID:24152204
Abstract

In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities. We will call this traditional type of QSAR model an "activity model". The activity model can be used to predict the activities of molecules not in the training set. A relatively new subfield for QSAR is domain applicability. The aim is to estimate the reliability of prediction of a specific molecule on a specific activity model. A number of different metrics have been proposed in the literature for this purpose. It is desirable to build a quantitative model of reliability against one or more of these metrics. We can call this an "error model". A previous publication from our laboratory (Sheridan J. Chem. Inf. Model., 2012, 52, 814-823.) suggested the simultaneous use of three metrics would be more discriminating than any one metric. An error model could be built in the form of a three-dimensional set of bins. When the number of metrics exceeds three, however, the bin paradigm is not practical. An obvious solution for constructing an error model using multiple metrics is to use a QSAR method, in our case random forest. In this paper we demonstrate the usefulness of this paradigm, specifically for determining whether a useful error model can be built and which metrics are most useful for a given problem. For the ten data sets and for the seven metrics we examine here, it appears that it is possible to construct a useful error model using only two metrics (TREE_SD and PREDICTED). These do not require calculating similarities/distances between the molecules being predicted and the molecules used to build the activity model, which can be rate-limiting.

摘要

在定量构效关系(QSAR)中,统计模型是从分子的训练集(用化学描述符表示)及其生物活性中生成的。我们将这种传统类型的 QSAR 模型称为“活性模型”。活性模型可用于预测未包含在训练集中的分子的活性。QSAR 的一个相对较新的子领域是领域适用性。目的是估计在特定活性模型上预测特定分子的可靠性。为此,文献中提出了许多不同的指标。理想情况下,针对一个或多个这些指标构建可靠性的定量模型。我们可以将其称为“误差模型”。我们实验室的先前出版物(Sheridan J. Chem. Inf. Model.,2012,52,814-823)表明,同时使用三种指标比使用任何一种指标更具辨别力。误差模型可以以三维的方式构建成一组箱。然而,当指标数量超过三个时,箱方法就不实用了。使用多个指标构建误差模型的一种明显方法是使用 QSAR 方法,在我们的情况下是随机森林。在本文中,我们展示了这种方法的有效性,特别是确定是否可以构建有用的误差模型以及哪些指标对于给定问题最有用。对于十个数据集和我们在这里检查的七个指标,似乎可以使用仅两个指标(TREE_SD 和 PREDICTED)构建有用的误差模型。这些指标不需要计算预测分子和构建活性模型的分子之间的相似性/距离,这可能会受到限制。

相似文献

1
Using random forest to model the domain applicability of another random forest model.使用随机森林模型来模拟另一个随机森林模型的领域适用性。
J Chem Inf Model. 2013 Nov 25;53(11):2837-50. doi: 10.1021/ci400482e. Epub 2013 Nov 5.
2
The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity.域适用性指标在估计 QSAR 预测误差方面的相对重要性随训练集多样性而变化。
J Chem Inf Model. 2015 Jun 22;55(6):1098-107. doi: 10.1021/acs.jcim.5b00110. Epub 2015 Jun 4.
3
Three useful dimensions for domain applicability in QSAR models using random forest.随机森林方法在定量构效关系模型中用于领域适用性的三个有用维度。
J Chem Inf Model. 2012 Mar 26;52(3):814-23. doi: 10.1021/ci300004n. Epub 2012 Mar 9.
4
General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。
J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.
5
Pre-processing feature selection for improved C&RT models for oral absorption.预处理特征选择可提高口服吸收的 C&RT 模型。
J Chem Inf Model. 2013 Oct 28;53(10):2730-42. doi: 10.1021/ci400378j. Epub 2013 Oct 9.
6
Does rational selection of training and test sets improve the outcome of QSAR modeling?训练集和测试集的合理选择是否能提高 QSAR 建模的结果?
J Chem Inf Model. 2012 Oct 22;52(10):2570-8. doi: 10.1021/ci300338w. Epub 2012 Oct 3.
7
Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.针对梨形四膜虫的环境毒性定量构效关系(QSAR)模型的批判性评估:聚焦适用域及变量选择导致的过拟合问题
J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.
8
Rank order entropy: why one metric is not enough.秩次熵:为何一种度量指标并不够。
J Chem Inf Model. 2011 Sep 26;51(9):2302-19. doi: 10.1021/ci200170k. Epub 2011 Aug 29.
9
Contemporary QSAR classifiers compared.当代定量构效关系分类器比较。
J Chem Inf Model. 2007 Jan-Feb;47(1):219-27. doi: 10.1021/ci600332j.
10
QSAR model as a random event: A case of rat toxicity.作为随机事件的定量构效关系模型:以大鼠毒性为例。
Bioorg Med Chem. 2015 Mar 15;23(6):1223-30. doi: 10.1016/j.bmc.2015.01.055. Epub 2015 Feb 7.

引用本文的文献

1
Data-Driven Approach Considering Imbalance in Data Sets and Experimental Conditions for Exploration of Photocatalysts.考虑数据集不平衡和实验条件的数据驱动方法用于光催化剂探索
ACS Omega. 2025 Apr 10;10(15):14626-14639. doi: 10.1021/acsomega.4c06997. eCollection 2025 Apr 22.
2
Rethinking the applicability domain analysis in QSAR models.重新思考 QSAR 模型中的适用性域分析。
J Comput Aided Mol Des. 2024 Feb 14;38(1):9. doi: 10.1007/s10822-024-00550-8.
3
Characterizing Soil Profile Salinization in Cotton Fields Using Landsat 8 Time-Series Data in Southern Xinjiang, China.
利用Landsat 8时间序列数据表征中国新疆南部棉田土壤剖面盐分状况
Sensors (Basel). 2023 Aug 7;23(15):7003. doi: 10.3390/s23157003.
4
Uncertainty quantification: Can we trust artificial intelligence in drug discovery?不确定性量化:在药物研发中我们能信任人工智能吗?
iScience. 2022 Jul 21;25(8):104814. doi: 10.1016/j.isci.2022.104814. eCollection 2022 Aug 19.
5
HobPre: accurate prediction of human oral bioavailability for small molecules.HobPre:小分子药物人体口服生物利用度的准确预测
J Cheminform. 2022 Jan 6;14(1):1. doi: 10.1186/s13321-021-00580-6.
6
Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints.机器学习在药理学和 ADMET 终点建模中的应用。
Methods Mol Biol. 2022;2390:61-101. doi: 10.1007/978-1-0716-1787-8_2.
7
A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.一种用于改进基于深度学习的定量构效关系回归建模中不确定性量化的混合框架。
J Cheminform. 2021 Sep 20;13(1):69. doi: 10.1186/s13321-021-00551-x.
8
Materials Precursor Score: Modeling Chemists' Intuition for the Synthetic Accessibility of Porous Organic Cage Precursors.材料前体得分:为多孔有机笼前体的合成可及性模拟化学家的直觉。
J Chem Inf Model. 2021 Sep 27;61(9):4342-4356. doi: 10.1021/acs.jcim.1c00375. Epub 2021 Aug 13.
9
Quantitative structure-activity relationship models for genotoxicity prediction based on combination evaluation strategies for toxicological alternative experiments.基于毒理学替代试验组合评价策略的遗传毒性预测定量构效关系模型。
Sci Rep. 2021 Apr 13;11(1):8030. doi: 10.1038/s41598-021-87035-y.
10
QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping.基于定量构效关系的亲和力指纹(第1部分):用于相似性搜索、生物活性分类和骨架跃迁的指纹构建与建模性能
J Cheminform. 2020 May 29;12(1):39. doi: 10.1186/s13321-020-00443-6.