• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于分子相似性的域适用性指标可有效识别域外化合物。

Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds.

机构信息

Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command , Fort Detrick , Maryland 21702 , United States.

出版信息

J Chem Inf Model. 2019 Jan 28;59(1):181-189. doi: 10.1021/acs.jcim.8b00597. Epub 2018 Nov 19.

DOI:10.1021/acs.jcim.8b00597
PMID:30404432
Abstract

Domain applicability (DA) is a concept introduced to gauge the reliability of quantitative structure-activity relationship (QSAR) predictions. A leading DA metric is ensemble variance, which is defined as the variance of predictions by an ensemble of QSAR models. However, this metric fails to identify large prediction errors in melting point (MP) data, despite the availability of large training data sets. In this study, we examined the performance of this metric on MP data and found that, for most molecules, ensemble variance increased as their structural similarity to the training molecules decreased. However, the metric decreased for "out-of-domain" molecules, i.e., molecules with little to no structural similarity to the training compounds. This explains why ensemble variance fails to identify large prediction errors. In contrast, a new molecular similarity-based DA metric that considers the contributions of all training molecules in gauging the reliability of a prediction successfully identified predictions of MP data for which the errors were large. To validate our results, we used four additional data sets of diverse molecular properties. We divided each data set into a training set and a test set at a ratio of approximately 2:1, ensuring a small fraction of the test compounds are out of the training domain. We then trained random forest (RF) models on the training data and made RF predictions for the test set molecules. Results from these data sets confirm that the new DA metric significantly outperformed ensemble variance in identifying predictions for out-of-domain compounds. For within-domain compounds, the two metrics performed similarly, with ensemble variance marginally but consistently outperforming the new DA metric. The new DA metric, which does not rely on an ensemble of QSAR models, can be deployed with any machine-learning method, including deep neural networks.

摘要

域适用性(DA)是一个用于评估定量构效关系(QSAR)预测可靠性的概念。一个主要的 DA 度量指标是集合方差,它定义为一组 QSAR 模型的预测方差。然而,尽管有大量的训练数据集,这个指标却无法识别熔点(MP)数据中的大预测误差。在这项研究中,我们检查了这个指标在 MP 数据上的性能,发现对于大多数分子,当它们与训练分子的结构相似度降低时,集合方差会增加。然而,对于“非域”分子,即与训练化合物几乎没有结构相似性的分子,该指标会降低。这解释了为什么集合方差无法识别大的预测误差。相比之下,一种新的基于分子相似性的 DA 度量指标,它考虑了所有训练分子对预测可靠性的贡献,成功地识别出了 MP 数据中误差较大的预测。为了验证我们的结果,我们使用了另外四个具有不同分子性质的数据集。我们将每个数据集按照大约 2:1 的比例分为训练集和测试集,确保测试化合物中只有一小部分是不在训练域内的。然后,我们在训练数据上训练随机森林(RF)模型,并对测试集分子进行 RF 预测。这些数据集的结果证实,新的 DA 度量指标在识别非域化合物的预测方面明显优于集合方差。对于域内化合物,这两个指标的性能相似,集合方差略优于新的 DA 度量指标。新的 DA 度量指标不依赖于一组 QSAR 模型,可以与任何机器学习方法(包括深度神经网络)一起部署。

相似文献

1
Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds.基于分子相似性的域适用性指标可有效识别域外化合物。
J Chem Inf Model. 2019 Jan 28;59(1):181-189. doi: 10.1021/acs.jcim.8b00597. Epub 2018 Nov 19.
2
General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。
J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.
3
Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure-Activity Relationship Models Based on Deep Neural Networks?解析机器学习对分子活性的预测:基于深度神经网络的定量构效关系模型是否需要适用域?
J Chem Inf Model. 2019 Jan 28;59(1):117-126. doi: 10.1021/acs.jcim.8b00348. Epub 2018 Nov 21.
4
The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity.域适用性指标在估计 QSAR 预测误差方面的相对重要性随训练集多样性而变化。
J Chem Inf Model. 2015 Jun 22;55(6):1098-107. doi: 10.1021/acs.jcim.5b00110. Epub 2015 Jun 4.
5
Rank order entropy: why one metric is not enough.秩次熵:为何一种度量指标并不够。
J Chem Inf Model. 2011 Sep 26;51(9):2302-19. doi: 10.1021/ci200170k. Epub 2011 Aug 29.
6
Evaluation of QSAR Equations for Virtual Screening.QSAR 方程在虚拟筛选中的评估。
Int J Mol Sci. 2020 Oct 22;21(21):7828. doi: 10.3390/ijms21217828.
7
Comprehensive ensemble in QSAR prediction for drug discovery.用于药物发现的 QSAR 预测的综合集成。
BMC Bioinformatics. 2019 Oct 26;20(1):521. doi: 10.1186/s12859-019-3135-4.
8
Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.针对梨形四膜虫的环境毒性定量构效关系(QSAR)模型的批判性评估:聚焦适用域及变量选择导致的过拟合问题
J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.
9
Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.评估机器学习可靠性方法,以量化定量构效关系回归模型的适用域。
J Chem Inf Model. 2014 Feb 24;54(2):431-41. doi: 10.1021/ci4006595. Epub 2014 Feb 11.
10
Three useful dimensions for domain applicability in QSAR models using random forest.随机森林方法在定量构效关系模型中用于领域适用性的三个有用维度。
J Chem Inf Model. 2012 Mar 26;52(3):814-23. doi: 10.1021/ci300004n. Epub 2012 Mar 9.

引用本文的文献

1
Machine learning-driven generation and screening of potential ionic liquids for cellulose dissolution.机器学习驱动的用于纤维素溶解的潜在离子液体的生成与筛选。
J Cheminform. 2025 May 21;17(1):78. doi: 10.1186/s13321-025-01018-z.
2
Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design.使用预训练的BERT和贝叶斯主动学习进行分子性质预测:一种数据高效的药物设计方法。
J Cheminform. 2025 Apr 23;17(1):58. doi: 10.1186/s13321-025-00986-6.
3
MONSTROUS: a web-based chemical-transporter interaction profiler.
MONSTROUS:一个基于网络的化学转运体相互作用分析工具。
Front Pharmacol. 2025 Feb 26;16:1498945. doi: 10.3389/fphar.2025.1498945. eCollection 2025.
4
Cheminformatics and artificial intelligence for accelerating agrochemical discovery.用于加速农用化学品发现的化学信息学与人工智能
Front Chem. 2023 Nov 29;11:1292027. doi: 10.3389/fchem.2023.1292027. eCollection 2023.
5
Uncertainty quantification: Can we trust artificial intelligence in drug discovery?不确定性量化:在药物研发中我们能信任人工智能吗?
iScience. 2022 Jul 21;25(8):104814. doi: 10.1016/j.isci.2022.104814. eCollection 2022 Aug 19.
6
Thermodynamics-Based Model Construction for the Accurate Prediction of Molecular Properties From Partition Coefficients.基于热力学的模型构建,用于从分配系数准确预测分子性质。
Front Chem. 2021 Sep 13;9:737579. doi: 10.3389/fchem.2021.737579. eCollection 2021.
7
A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.一种用于改进基于深度学习的定量构效关系回归建模中不确定性量化的混合框架。
J Cheminform. 2021 Sep 20;13(1):69. doi: 10.1186/s13321-021-00551-x.
8
A quantitative uncertainty metric controls error in neural network-driven chemical discovery.一种定量不确定性度量可控制神经网络驱动的化学发现中的误差。
Chem Sci. 2019 Jul 11;10(34):7913-7922. doi: 10.1039/c9sc02298h. eCollection 2019 Sep 14.