• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

扩展趋势向量:趋势矩阵与基于样本的偏最小二乘法

Extending the trend vector: the trend matrix and sample-based partial least squares.

作者信息

Sheridan R P, Nachbar R B, Bush B L

机构信息

Molecular Systems Department, Merck Research Laboratories, Rahway, NJ 07065.

出版信息

J Comput Aided Mol Des. 1994 Jun;8(3):323-40. doi: 10.1007/BF00126749.

DOI:10.1007/BF00126749
PMID:7964931
Abstract

Trend vector analysis [Carhart, R.E. et al., J. Chem. Inf. Comput. Sci., 25 (1985) 64], in combination with topological descriptors such as atom pairs, has proved useful in drug discovery for ranking large collections of chemical compounds in order of predicted biological activity. The compounds with the highest predicted activities, upon being tested, often show a several-fold increase in the fraction of active compounds relative to a randomly selected set. A trend vector is simply the one-dimensional array of correlations between the biological activity of interest and a set of properties or 'descriptors' of compounds in a training set. This paper examines two methods for generalizing the trend vector to improve the predicted rank order. The trend matrix method finds the correlations between the residuals and the simultaneous occurrence of descriptors, which are stored in a two-dimensional analog of the trend vector. The SAMPLS method derives a linear model by partial least squares (PLS), using the 'sample-based' formulation of PLS [Bush, B.L. and Nachbar, R.B., J. Comput.-Aided Mol. Design, 7 (1993) 587] for efficiency in treating the large number of descriptors. PLS accumulates a predictive model as a sum of linear components. Expressed as a vector of prediction coefficients on properties, the first PLS component is proportional to the trend vector. Subsequent components adjust the model toward full least squares. For both methods the residuals decrease, while the risk of overfitting the training set increases. We therefore also describe statistical checks to prevent overfitting. These methods are applied to two data sets, a small homologous series of disubstituted piperidines, tested on the dopamine receptor, and a large set of diverse chemical structures, some of which are active at the muscarinic receptor. Each data set is split into a training set and a test set, and the activities in the test set are predicted from a fit on the training set. Both the trend matrix and the SAMPLS approach improve the predictions over the simple trend vector. The SAMPLS approach is superior to the trend matrix in that it requires much less storage and CPU time. It also provides a useful set of axes for visualizing properties of the compounds. We describe a randomization method to determine the optimum number of PLS components that is very much faster for large training sets than leave-one-out cross-validation.

摘要

趋势向量分析[卡哈特,R.E.等人,《化学信息与计算机科学杂志》,25(1985)64],与诸如原子对之类的拓扑描述符相结合,已被证明在药物发现中对于按照预测的生物活性对大量化合物进行排序很有用。经测试,预测活性最高的化合物相对于随机选择的一组化合物,其活性化合物的比例通常会有几倍的增加。趋势向量简单来说就是感兴趣的生物活性与训练集中一组化合物的性质或“描述符”之间的一维相关数组。本文研究了两种用于推广趋势向量以改善预测排名顺序的方法。趋势矩阵法找到残差与描述符同时出现之间的相关性,并将其存储在趋势向量的二维类似物中。SAMPLS方法通过偏最小二乘法(PLS)推导线性模型,使用PLS的“基于样本”公式[布什,B.L.和纳赫巴,R.B.,《计算机辅助分子设计杂志》,7(1993)587]以提高处理大量描述符的效率。PLS将预测模型累积为线性分量的总和。表示为性质上的预测系数向量时,第一个PLS分量与趋势向量成比例。后续分量将模型调整为完全最小二乘法。对于这两种方法,残差都会减小,而过度拟合训练集的风险会增加。因此,我们还描述了防止过度拟合的统计检验。这些方法应用于两个数据集,一个是在多巴胺受体上测试的小的二取代哌啶同系物系列,另一个是大量不同化学结构的集合,其中一些对毒蕈碱受体有活性。每个数据集都被分成一个训练集和一个测试集,并根据对训练集的拟合来预测测试集中的活性。趋势矩阵法和SAMPLS方法都比简单的趋势向量改进了预测。SAMPLS方法优于趋势矩阵法,因为它所需的存储和CPU时间要少得多。它还提供了一组有用的轴来可视化化合物的性质。我们描述了一种随机化方法来确定PLS分量的最佳数量,对于大型训练集,该方法比留一法交叉验证要快得多。

相似文献

1
Extending the trend vector: the trend matrix and sample-based partial least squares.扩展趋势向量:趋势矩阵与基于样本的偏最小二乘法
J Comput Aided Mol Des. 1994 Jun;8(3):323-40. doi: 10.1007/BF00126749.
2
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
3
Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques.使用逐步多元线性回归(MLR)、因子分析多元线性回归(FA-MLR)、偏最小二乘法(PLS)、广义因子分析(GFA)、广义偏最小二乘法(G/PLS)和人工神经网络(ANN)技术对结构多样的化合物的细胞色素3A4抑制活性进行比较化学计量学建模。
Eur J Med Chem. 2009 Jul;44(7):2913-22. doi: 10.1016/j.ejmech.2008.12.004. Epub 2008 Dec 16.
4
Combinatorial QSAR of ambergris fragrance compounds.龙涎香香料化合物的组合定量构效关系
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):582-95. doi: 10.1021/ci034203t.
5
Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA.样本距离偏最小二乘法:针对多个变量进行优化的偏最小二乘法及其在比较分子场分析中的应用
J Comput Aided Mol Des. 1993 Oct;7(5):587-619. doi: 10.1007/BF00124364.
6
Chemometrics-assisted simultaneous voltammetric determination of ascorbic acid, uric acid, dopamine and nitrite: application of non-bilinear voltammetric data for exploiting first-order advantage.化学计量学辅助同时伏安法测定抗坏血酸、尿酸、多巴胺和亚硝酸盐:利用非双线性伏安数据发挥一阶优势的应用
Talanta. 2014 Feb;119:553-63. doi: 10.1016/j.talanta.2013.11.028. Epub 2013 Nov 27.
7
Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools.使用化学计量学工具对CCR5拮抗剂哌啶衍生物进行预测性QSAR建模。
J Enzyme Inhib Med Chem. 2009 Feb;24(1):205-23. doi: 10.1080/14756360802051297.
8
Development of linear and nonlinear predictive QSAR models and their external validation using molecular similarity principle for anti-HIV indolyl aryl sulfones.基于分子相似性原理的抗HIV吲哚基芳基砜类线性和非线性预测QSAR模型的构建及其外部验证
J Enzyme Inhib Med Chem. 2008 Dec;23(6):980-95. doi: 10.1080/14756360701811379.
9
Application of validated QSAR models of D1 dopaminergic antagonists for database mining.经验证的D1多巴胺能拮抗剂定量构效关系模型在数据库挖掘中的应用。
J Med Chem. 2005 Nov 17;48(23):7322-32. doi: 10.1021/jm049116m.
10
2D QSAR modeling and preliminary database searching for dopamine transporter inhibitors using genetic algorithm variable selection of Molconn Z descriptors.使用Molconn Z描述符的遗传算法变量选择进行多巴胺转运体抑制剂的二维定量构效关系建模及初步数据库搜索。
J Med Chem. 2000 Nov 2;43(22):4151-9. doi: 10.1021/jm990472s.

引用本文的文献

1
Measuring interference of drug-like molecules with the respiratory chain: toward the early identification of mitochondrial uncouplers in lead finding.测量类药物分子对呼吸链的干扰:用于在先导化合物发现阶段早期鉴定线粒体解偶联剂
Assay Drug Dev Technol. 2013 Sep;11(7):408-22. doi: 10.1089/adt.2012.463. Epub 2013 Aug 30.
2
Brainstorming: weighted voting prediction of inhibitors for protein targets.集思广益:加权投票预测蛋白质靶标抑制剂。
J Mol Model. 2011 Sep;17(9):2133-41. doi: 10.1007/s00894-010-0854-x. Epub 2010 Sep 21.
3
kNNsim: k-nearest neighbors similarity with genetic algorithm features optimization enhances the prediction of activity classes for small molecules.

本文引用的文献

1
Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins.比较分子场分析(CoMFA)。1. 形状对类固醇与载体蛋白结合的影响。
J Am Chem Soc. 1988 Aug 1;110(18):5959-67. doi: 10.1021/ja00226a005.
2
Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA.样本距离偏最小二乘法:针对多个变量进行优化的偏最小二乘法及其在比较分子场分析中的应用
J Comput Aided Mol Des. 1993 Oct;7(5):587-619. doi: 10.1007/BF00124364.
3
A method for automatic generation of novel chemical structures and its potential applications to drug discovery.
kNNsim:通过遗传算法特征优化的k近邻相似度增强了小分子活性类别的预测。
J Mol Model. 2009 Jun;15(6):591-6. doi: 10.1007/s00894-008-0349-1. Epub 2008 Jul 29.
4
Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.基于实验数据集的多样性采样进行训练集和测试集选择的预测性定量构效关系建模。
Mol Divers. 2002;5(4):231-43. doi: 10.1023/a:1021372108686.
5
Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.基于实验数据集的多样性采样进行训练集和测试集选择的预测性定量构效关系建模。
J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69. doi: 10.1023/a:1020869118689.
6
QSAR using 2D descriptors and TRIPOS' SIMCA.使用二维描述符和TRIPOS的SIMCA进行定量构效关系研究。
J Comput Aided Mol Des. 1999 Sep;13(5):453-67. doi: 10.1023/a:1008091001082.
J Chem Inf Comput Sci. 1991 Nov;31(4):527-30. doi: 10.1021/ci00004a016.
4
Novel piperidine sigma receptor ligands as potential antipsychotic drugs.新型哌啶σ受体配体作为潜在的抗精神病药物。
J Med Chem. 1992 Nov 13;35(23):4344-61. doi: 10.1021/jm00101a012.