无偏描述符和参数选择证实了蛋白质化学计量学建模的潜力。

Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling.

作者信息

Freyhult Eva, Prusis Peteris, Lapinsh Maris, Wikberg Jarl E S, Moulton Vincent, Gustafsson Mats G

机构信息

The Linnaeus Centre for Bioinformatics, Uppsala University, Box 598, S-751 24 Uppsala, Sweden.

出版信息

BMC Bioinformatics. 2005 Mar 10;6:50. doi: 10.1186/1471-2105-6-50.

DOI:10.1186/1471-2105-6-50

PMID:15760465

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC555743/

Abstract

BACKGROUND

Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Several reported proteochemometric models of ligand-receptor interactions have already yielded significant insights into various forms of bio-molecular interactions. The proteochemometric models are multivariate regression models that predict binding affinity for a particular combination of features of the ligand and protein. Although proteochemometric models have already offered interesting results in various studies, no detailed statistical evaluation of their average predictive power has been performed. In particular, variable subset selection performed to date has always relied on using all available examples, a situation also encountered in microarray gene expression data analysis.

RESULTS

A methodology for an unbiased evaluation of the predictive power of proteochemometric models was implemented and results from applying it to two of the largest proteochemometric data sets yet reported are presented. A double cross-validation loop procedure is used to estimate the expected performance of a given design method. The unbiased performance estimates (P2) obtained for the data sets that we consider confirm that properly designed single proteochemometric models have useful predictive power, but that a standard design based on cross validation may yield models with quite limited performance. The results also show that different commercial software packages employed for the design of proteochemometric models may yield very different and therefore misleading performance estimates. In addition, the differences in the models obtained in the double CV loop indicate that detailed chemical interpretation of a single proteochemometric model is uncertain when data sets are small.

CONCLUSION

The double CV loop employed offer unbiased performance estimates about a given proteochemometric modelling procedure, making it possible to identify cases where the proteochemometric design does not result in useful predictive models. Chemical interpretations of single proteochemometric models are uncertain and should instead be based on all the models selected in the double CV loop employed here.

摘要

背景

蛋白质化学计量学是一种新方法，可直接从实际相互作用测量数据预测蛋白质功能，而无需三维结构信息。已报道的几种配体 - 受体相互作用的蛋白质化学计量模型已对各种生物分子相互作用形式产生了重要见解。蛋白质化学计量模型是多元回归模型，可预测配体和蛋白质特定特征组合的结合亲和力。尽管蛋白质化学计量模型在各种研究中已给出有趣结果，但尚未对其平均预测能力进行详细的统计评估。特别是，迄今为止进行的变量子集选择一直依赖于使用所有可用示例，这在微阵列基因表达数据分析中也会遇到。

结果

实施了一种用于无偏评估蛋白质化学计量模型预测能力的方法，并展示了将其应用于两个迄今报道的最大蛋白质化学计量数据集的结果。使用双交叉验证循环程序来估计给定设计方法的预期性能。我们考虑的数据集获得的无偏性能估计（P2）证实，设计合理的单个蛋白质化学计量模型具有有用的预测能力，但基于交叉验证的标准设计可能产生性能相当有限的模型。结果还表明，用于设计蛋白质化学计量模型的不同商业软件包可能产生非常不同且因此具有误导性的性能估计。此外，在双CV循环中获得的模型差异表明，当数据集较小时，单个蛋白质化学计量模型的详细化学解释是不确定的。

结论

所采用的双CV循环提供了关于给定蛋白质化学计量建模程序的无偏性能估计，使得能够识别蛋白质化学计量设计未产生有用预测模型的情况。单个蛋白质化学计量模型的化学解释是不确定的，而应基于此处采用的双CV循环中选择的所有模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64d9/555743/397a71cde1bf/1471-2105-6-50-1.jpg

相似文献

Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling.无偏描述符和参数选择证实了蛋白质化学计量学建模的潜力。

BMC Bioinformatics. 2005 Mar 10;6:50. doi: 10.1186/1471-2105-6-50.

Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand interactions.基于粗糙集的G蛋白偶联受体-配体相互作用的蛋白质化学计量学建模

Proteins. 2006 Apr 1;63(1):24-34. doi: 10.1002/prot.20777.

Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery.微阵列转录数据中存在许多准确的小判别特征子集：生物标志物发现。

BMC Bioinformatics. 2005 Apr 13;6:97. doi: 10.1186/1471-2105-6-97.

Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.基于疾病谱数据中错误发现率的七种生成Affymetrix表达分数方法的比较。

BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.

In silico microdissection of microarray data from heterogeneous cell populations.对来自异质细胞群体的微阵列数据进行计算机模拟显微切割。

BMC Bioinformatics. 2005 Mar 14;6:54. doi: 10.1186/1471-2105-6-54.

The effects of normalization on the correlation structure of microarray data.标准化对微阵列数据相关结构的影响。

BMC Bioinformatics. 2005 May 16;6:120. doi: 10.1186/1471-2105-6-120.

Proteochemometric mapping of the interaction of organic compounds with melanocortin receptor subtypes.有机化合物与黑皮质素受体亚型相互作用的蛋白质化学计量学图谱

Mol Pharmacol. 2005 Jan;67(1):50-9. doi: 10.1124/mol.104.002857. Epub 2004 Oct 6.

External cross-validation for unbiased evaluation of protein family detectors: application to allergens.用于蛋白质家族检测器无偏评估的外部交叉验证：在过敏原中的应用。

Proteins. 2005 Dec 1;61(4):918-25. doi: 10.1002/prot.20656.

引用本文的文献

Chagas Disease: Perspectives on the Past and Present and Challenges in Drug Discovery.恰加斯病：过去、现在的观点以及药物发现中的挑战。

Molecules. 2020 Nov 23;25(22):5483. doi: 10.3390/molecules25225483.

Current computational methods for predicting protein interactions of natural products.预测天然产物蛋白质相互作用的当前计算方法。

Comput Struct Biotechnol J. 2019 Oct 28;17:1367-1376. doi: 10.1016/j.csbj.2019.08.008. eCollection 2019.

Structural and conformational determinants of macrocycle cell permeability.大环化合物细胞通透性的结构和构象决定因素。

Nat Chem Biol. 2016 Dec;12(12):1065-1074. doi: 10.1038/nchembio.2203. Epub 2016 Oct 17.

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation.在模型不确定性下使用双重交叉验证对定量构效关系（QSAR）模型的预测误差进行可靠估计。

J Cheminform. 2014 Nov 26;6(1):47. doi: 10.1186/s13321-014-0047-1. eCollection 2014.

Proteochemometric model for predicting the inhibition of penicillin-binding proteins.用于预测青霉素结合蛋白抑制作用的蛋白质化学计量模型

J Comput Aided Mol Des. 2015 Feb;29(2):127-41. doi: 10.1007/s10822-014-9809-0. Epub 2014 Oct 26.

Profiling of a prescription drug library for potential renal drug-drug interactions mediated by the organic cation transporter 2.分析潜在的经有机阳离子转运体 2 介导的肾脏药物-药物相互作用的处方药物库。

J Med Chem. 2011 Jul 14;54(13):4548-58. doi: 10.1021/jm2001629. Epub 2011 Jun 8.

Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques.基于对齐和非对齐方法的激酶描述以及线性和非线性数据分析技术的全激酶组相互作用建模。

BMC Bioinformatics. 2010 Jun 22;11:339. doi: 10.1186/1471-2105-11-339.

Virtual screening of GPCRs: an in silico chemogenomics approach.G蛋白偶联受体的虚拟筛选：一种计算机辅助化学基因组学方法。

BMC Bioinformatics. 2008 Sep 6;9:363. doi: 10.1186/1471-2105-9-363.

The C1C2: a framework for simultaneous model selection and assessment.C1C2：一种用于同时进行模型选择和评估的框架。

BMC Bioinformatics. 2008 Sep 2;9:360. doi: 10.1186/1471-2105-9-360.

GPCRTree: online hierarchical classification of GPCR function.GPCRTree：G蛋白偶联受体功能的在线分层分类

BMC Res Notes. 2008 Aug 21;1:67. doi: 10.1186/1756-0500-1-67.

本文引用的文献

Melanocortin receptors: ligands and proteochemometrics modeling.黑皮质素受体：配体与蛋白质化学计量学建模

Ann N Y Acad Sci. 2003 Jun;994:21-6. doi: 10.1111/j.1749-6632.2003.tb03158.x.

Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR.结构建模将抗体-溶菌酶相互作用的定量构效关系（QSAR）分析扩展到三维定量构效关系（3D-QSAR）。

Biophys J. 2003 Apr;84(4):2264-72. doi: 10.1016/S0006-3495(03)75032-2.

Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands.胺类G蛋白偶联受体与多种配体相互作用的蛋白质化学计量学建模

Mol Pharmacol. 2002 Jun;61(6):1465-75. doi: 10.1124/mol.61.6.1465.

Proteo-chemometrics analysis of MSH peptide binding to melanocortin receptors.MSH 肽与黑皮质素受体结合的蛋白质化学计量学分析

Protein Eng. 2002 Apr;15(4):305-11. doi: 10.1093/protein/15.4.305.

Selection bias in gene extraction on the basis of microarray gene-expression data.基于微阵列基因表达数据进行基因提取时的选择偏倚。

Proc Natl Acad Sci U S A. 2002 May 14;99(10):6562-6. doi: 10.1073/pnas.102102699. Epub 2002 Apr 30.

Beware of q2!小心q2！

J Mol Graph Model. 2002 Jan;20(4):269-76. doi: 10.1016/s1093-3263(01)00123-1.

Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions.蛋白质组化学计量学的发展：一种分析药物-受体相互作用的新技术。

Biochim Biophys Acta. 2001 Feb 16;1525(1-2):180-90. doi: 10.1016/s0304-4165(00)00187-2.

A probabilistic derivation of the partial least-squares algorithm.偏最小二乘法算法的概率推导

J Chem Inf Comput Sci. 2001 Mar-Apr;41(2):288-94. doi: 10.1021/ci0003909.

Predicting the kinetics of peptide-antibody interactions using a multivariate experimental design of sequence and chemical space.使用序列和化学空间的多变量实验设计预测肽-抗体相互作用的动力学。

J Mol Recognit. 2001 Jan-Feb;14(1):62-71. doi: 10.1002/1099-1352(200101/02)14:1<62::AID-JMR520>3.0.CO;2-T.

GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors.网格独立描述符（GRIND）：一类新型的与比对无关的三维分子描述符。

J Med Chem. 2000 Aug 24;43(17):3233-43. doi: 10.1021/jm000941m.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无偏描述符和参数选择证实了蛋白质化学计量学建模的潜力。

Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献