评估机器学习可靠性方法，以量化定量构效关系回归模型的适用域。

Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.

机构信息

Faculty of Computer and Information Science, University of Ljubljana , Tržaška 25, 1000 Ljubljana, Slovenia.

出版信息

J Chem Inf Model. 2014 Feb 24;54(2):431-41. doi: 10.1021/ci4006595. Epub 2014 Feb 11.

Abstract

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite.

摘要

化学空间的广阔性和记录分子性质的实验数据相对较小，这要求我们确定可以有把握地应用定量构效关系（QSAR）模型的子空间或领域。在这些领域中，QSAR 模型的预测是可靠的，并且对这些化合物的潜在后续研究将发现，预测结果与实验值非常吻合。QSAR 的标准方法假设，对于与实验数据更密集的子空间中的化合物“相似”的化合物，预测更可靠。在这里，我们报告了机器学习社区最近提出的一组替代技术的研究。这些方法通过估计感兴趣点的预测误差来量化预测置信度。我们的研究包括 20 个具有连续响应的公共 QSAR 数据集，并通过观察它们与预测误差的相关性来评估 10 种可靠性评分方法的质量。我们表明，这些新的替代方法可以胜过仅依赖于训练集中化合物相似性的标准可靠性得分。结果还表明，可靠性评分方法的质量对数据集特征和 QSAR 中使用的回归方法敏感。我们证明，通过整合来自各种可靠性估计方法的分数，可以利用这些依赖性，尽管这会增加计算复杂性。本文描述的可靠性估计技术已在 Orange 数据挖掘套件的开源附加组件（https://bitbucket.org/biolab/orange-reliability）中实现。

相似文献

Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.

J Chem Inf Model. 2014 Feb 24;54(2):431-41. doi: 10.1021/ci4006595. Epub 2014 Feb 11.

Localized heuristic inverse quantitative structure activity relationship with bulk descriptors using numerical gradients.

J Chem Inf Model. 2013 Aug 26;53(8):2001-17. doi: 10.1021/ci400281y. Epub 2013 Jul 25.

Merging applicability domains for in silico assessment of chemical mutagenicity.

J Chem Inf Model. 2014 Mar 24;54(3):793-800. doi: 10.1021/ci500016v. Epub 2014 Feb 14.

Prediction of antibacterial compounds by machine learning approaches.

J Comput Chem. 2009 Jun;30(8):1202-11. doi: 10.1002/jcc.21148.

J Chem Inf Model. 2019 Jan 28;59(1):181-189. doi: 10.1021/acs.jcim.8b00597. Epub 2018 Nov 19.

Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis.

J Chem Inf Model. 2008 Apr;48(4):766-84. doi: 10.1021/ci700443v. Epub 2008 Mar 1.

General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.

J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.

Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.

Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

J Comput Aided Mol Des. 2007 Sep;21(9):485-98. doi: 10.1007/s10822-007-9125-z. Epub 2007 Jul 14.

SAR QSAR Environ Res. 2011 Jun;22(3):385-410. doi: 10.1080/1062936X.2011.569943.

引用本文的文献

Dataset Design for Building Models of Chemical Reactivity.

ACS Cent Sci. 2023 Dec 8;9(12):2196-2204. doi: 10.1021/acscentsci.3c01163. eCollection 2023 Dec 27.

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon.

Chem Res Toxicol. 2023 Sep 18;36(9):1503-1517. doi: 10.1021/acs.chemrestox.3c00137. Epub 2023 Aug 16.

A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling.

J Cheminform. 2021 Sep 20;13(1):69. doi: 10.1186/s13321-021-00551-x.

Assigning confidence to molecular property prediction.

Expert Opin Drug Discov. 2021 Sep;16(9):1009-1023. doi: 10.1080/17460441.2021.1925247. Epub 2021 Jun 15.

Developing a Kinase-Specific Target Selection Method Using a Structure-Based Machine Learning Approach.

Adv Appl Bioinform Chem. 2020 Dec 2;13:27-40. doi: 10.2147/AABC.S278900. eCollection 2020.

Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning.

Chem Sci. 2019 Jul 10;10(35):8154-8163. doi: 10.1039/c9sc00616h. eCollection 2019 Sep 21.

How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?

ACS Omega. 2018 Sep 19;3(9):11392-11406. doi: 10.1021/acsomega.8b01647. eCollection 2018 Sep 30.

Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

Wiley Interdiscip Rev Comput Mol Sci. 2015 Nov-Dec;5(6):405-424. doi: 10.1002/wcms.1225. Epub 2015 Aug 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估机器学习可靠性方法，以量化定量构效关系回归模型的适用域。

Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献