局部惰性回归：利用邻域来改进定量构效关系预测。

Local lazy regression: making use of the neighborhood to improve QSAR predictions.

作者信息

Guha Rajarshi, Dutta Debojyoti, Jurs Peter C, Chen Ting

机构信息

Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA.

出版信息

J Chem Inf Model. 2006 Jul-Aug;46(4):1836-47. doi: 10.1021/ci060064e.

DOI:10.1021/ci060064e

PMID:16859315

Abstract

Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.

摘要

传统的定量构效关系（QSAR）模型旨在捕捉数据集中存在的全局构效趋势。在许多情况下，可能存在一组分子，它们表现出与活性或非活性相关的特定特征集。这样一组特征可以说是代表了局部构效关系。传统的QSAR模型可能无法识别这种局部关系。在这项工作中，我们研究了局部懒惰回归（LLR）的应用，它使用查询分子的局部邻域来获得预测，而不是考虑整个数据集。这种建模方法对于非常大的数据集特别有用，因为无需构建先验模型。我们将该技术应用于三个生物学数据集。在第一个案例中，外部预测集的均方根误差（RMSE）为0.94对数单位，而全局模型为0.92对数单位。然而，LLR能够以更高的准确度表征一组特定的异常分子（全局模型为0.70对数单位，LLR为0.64对数单位）。对于第二个数据集，LLR技术使外部预测集的RMSE从0.36对数单位降至0.31对数单位。在第三个案例中，我们得到的RMSE为2.01对数单位，而全局模型为2.16对数单位。在所有案例中，与全局模型相比，LLR导致一些观测值预测效果不佳。我们对观察到这种情况的原因以及局部回归方法可能的改进进行了分析。

相似文献

Local lazy regression: making use of the neighborhood to improve QSAR predictions.

J Chem Inf Model. 2006 Jul-Aug;46(4):1836-47. doi: 10.1021/ci060064e.

A new strategy to improve the predictive ability of the local lazy regression and its application to the QSAR study of melanin-concentrating hormone receptor 1 antagonists.

J Comput Chem. 2010 Apr 15;31(5):973-85. doi: 10.1002/jcc.21383.

Local and global quantitative structure-activity relationship modeling and prediction for the baseline toxicity.

J Chem Inf Model. 2007 Jan-Feb;47(1):159-69. doi: 10.1021/ci600299j.

Prediction of retention indices of drugs based on immobilized artificial membrane chromatography using Projection Pursuit Regression and Local Lazy Regression.

J Sep Sci. 2008 Jul;31(12):2325-33. doi: 10.1002/jssc.200700665.

Determination and prediction of xenoestrogens by recombinant yeast-based assay and QSAR.

Chemosphere. 2009 Mar;74(9):1152-7. doi: 10.1016/j.chemosphere.2008.11.081. Epub 2009 Jan 10.

Mode of action-based local QSAR modeling for the prediction of acute toxicity in the fathead minnow.

J Mol Graph Model. 2007 Jul;26(1):327-35. doi: 10.1016/j.jmgm.2006.12.009. Epub 2006 Dec 16.

Global, local and novel consensus quantitative structure-activity relationship studies of 4-(Phenylaminomethylene) isoquinoline-1, 3 (2H, 4H)-diones as potent inhibitors of the cyclin-dependent kinase 4.

Anal Chim Acta. 2009 Jun 30;644(1-2):17-24. doi: 10.1016/j.aca.2009.04.019. Epub 2009 Apr 19.

Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors.

J Chem Inf Comput Sci. 2004 Nov-Dec;44(6):2179-89. doi: 10.1021/ci049849f.

Determining the validity of a QSAR model--a classification approach.

J Chem Inf Model. 2005 Jan-Feb;45(1):65-73. doi: 10.1021/ci0497511.

Stochastic versus stepwise strategies for quantitative structure-activity relationship generation--how much effort may the mining for successful QSAR models take?

J Chem Inf Model. 2007 May-Jun;47(3):927-39. doi: 10.1021/ci600476r. Epub 2007 May 5.

引用本文的文献

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.

J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.

Methodology of aiQSAR: a group-specific approach to QSAR modelling.

J Cheminform. 2019 Apr 3;11(1):27. doi: 10.1186/s13321-019-0350-y.

Trovafloxacin enhances lipopolysaccharide-stimulated production of tumor necrosis factor-α by macrophages: role of the DNA damage response.

J Pharmacol Exp Ther. 2014 Jul;350(1):164-70. doi: 10.1124/jpet.114.214189. Epub 2014 May 9.

lazar: a modular predictive toxicology framework.

Front Pharmacol. 2013 Apr 9;4:38. doi: 10.3389/fphar.2013.00038. eCollection 2013.

Current mathematical methods used in QSAR/QSPR studies.

Int J Mol Sci. 2009 Apr 29;10(5):1978-1998. doi: 10.3390/ijms10051978.

Pre-docking filter for protein and ligand 3D structures.

Bioinformation. 2008;3(5):189-93. doi: 10.6026/97320630003189. Epub 2008 Dec 31.

Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays.

J Comput Aided Mol Des. 2008 Jun-Jul;22(6-7):367-84. doi: 10.1007/s10822-008-9192-9. Epub 2008 Feb 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

局部惰性回归：利用邻域来改进定量构效关系预测。

Local lazy regression: making use of the neighborhood to improve QSAR predictions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献