关于复杂结构注释对二硫键连接模式预测的相关性。

On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

机构信息

Bioinformatics and Modeling, GIGA-Research, Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium.

出版信息

PLoS One. 2013;8(2):e56621. doi: 10.1371/journal.pone.0056621. Epub 2013 Feb 15.

DOI:10.1371/journal.pone.0056621

PMID:23533562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3574028/

Abstract

Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to [Formula: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.

摘要

二硫键强烈约束许多蛋白质的天然结构，因此预测其形成是蛋白质结构和功能推断的关键子问题。最近提出的大多数用于解决该预测问题的方法都采用以下流程：首先，他们用结构注释丰富原始序列；其次，他们对每个候选半胱氨酸对应用二进制分类器来预测二硫键结合概率；最后，他们使用最大权重图匹配算法来推导出蛋白质的预测二硫键连接模式。在本文中，我们采用了这三个步骤的流程，并对各种结构注释和特征编码的相关性进行了广泛的研究。特别是，我们考虑了五种结构注释，其中三种在二硫键预测的背景下是新颖的。为了使这些注释能够被机器学习算法使用，它们必须被编码为特征。为此，我们基于局部窗口和不同类型的直方图提出了四种不同的特征编码。结构注释与这些可能的编码相结合会导致大量可能的特征函数。为了在这些特征函数中识别出相关的最小特征函数子集，我们提出了一种高效且可解释的特征函数选择方案，旨在避免任何形式的过拟合。我们将此方案应用于三种监督学习算法：k-近邻、支持向量机和极端随机树。我们的结果表明，仅使用 PSSM（位置特异性评分矩阵）和 CSP（半胱氨酸分离分布）就足以构建高性能的二硫键模式预测器，并且极端随机树在基准数据集 SPX[Formula: see text]上达到了 [Formula: see text]的二硫键模式预测准确率，这比最新技术提高了 [Formula: see text]。一个网络应用程序可在 http://m24.giga.ulg.ac.be:81/x3CysBridges 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6de/3574028/7f76aa882d94/pone.0056621.g001.jpg

相似文献

On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

PLoS One. 2013;8(2):e56621. doi: 10.1371/journal.pone.0056621. Epub 2013 Feb 15.

On the encoding of proteins for disordered regions prediction.

PLoS One. 2013 Dec 16;8(12):e82252. doi: 10.1371/journal.pone.0082252. eCollection 2013.

A simplified approach to disulfide connectivity prediction from protein sequences.

BMC Bioinformatics. 2008 Jan 14;9:20. doi: 10.1186/1471-2105-9-20.

diSBPred: A machine learning based approach for disulfide bond prediction.

Comput Biol Chem. 2021 Apr;91:107436. doi: 10.1016/j.compbiolchem.2021.107436. Epub 2021 Jan 27.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

Disulfide connectivity prediction using recursive neural networks and evolutionary information.

Bioinformatics. 2004 Mar 22;20(5):653-9. doi: 10.1093/bioinformatics/btg463. Epub 2004 Jan 22.

Predicting disulfide connectivity patterns.

Proteins. 2007 May 1;67(2):262-70. doi: 10.1002/prot.21309.

Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines.

Comput Biol Med. 2013 Nov;43(11):1941-8. doi: 10.1016/j.compbiomed.2013.09.008. Epub 2013 Sep 18.

DBCP: a web server for disulfide bonding connectivity pattern prediction without the prior knowledge of the bonding state of cysteines.

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W503-7. doi: 10.1093/nar/gkq514. Epub 2010 Jun 8.

Improving the accuracy of predicting disulfide connectivity by feature selection.

J Comput Chem. 2010 May;31(7):1478-85. doi: 10.1002/jcc.21433.

引用本文的文献

Probabilistic divergence of a template-based modelling methodology from the ideal protocol.

J Mol Model. 2021 Jan 7;27(2):25. doi: 10.1007/s00894-020-04640-w.

Bacterial thiol oxidoreductases - from basic research to new antibacterial strategies.

Appl Microbiol Biotechnol. 2017 May;101(10):3977-3989. doi: 10.1007/s00253-017-8291-8. Epub 2017 Apr 13.

Soft Computing Methods for Disulfide Connectivity Prediction.

Evol Bioinform Online. 2015 Oct 20;11:223-9. doi: 10.4137/EBO.S25349. eCollection 2015.

On the encoding of proteins for disordered regions prediction.

PLoS One. 2013 Dec 16;8(12):e82252. doi: 10.1371/journal.pone.0082252. eCollection 2013.

本文引用的文献

Improving the prediction of disulfide bonds in Eukaryotes with machine learning methods and protein subcellular localization.

Bioinformatics. 2011 Aug 15;27(16):2224-30. doi: 10.1093/bioinformatics/btr387. Epub 2011 Jun 29.

Ongoing and future developments at the Universal Protein Resource.

Nucleic Acids Res. 2011 Jan;39(Database issue):D214-9. doi: 10.1093/nar/gkq1020. Epub 2010 Nov 4.

DBCP: a web server for disulfide bonding connectivity pattern prediction without the prior knowledge of the bonding state of cysteines.

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W503-7. doi: 10.1093/nar/gkq514. Epub 2010 Jun 8.

Assessment of disorder predictions in CASP8.

Proteins. 2009;77 Suppl 9:210-6. doi: 10.1002/prot.22586.

A simplified approach to disulfide connectivity prediction from protein sequences.

BMC Bioinformatics. 2008 Jan 14;9:20. doi: 10.1186/1471-2105-9-20.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

A review of feature selection techniques in bioinformatics.

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

Predicting disulfide connectivity patterns.

Proteins. 2007 May 1;67(2):262-70. doi: 10.1002/prot.21309.

DISULFIND: a disulfide bonding state and cysteine connectivity prediction server.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W177-81. doi: 10.1093/nar/gkl266.

Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching.

Proteins. 2006 Mar 15;62(3):617-29. doi: 10.1002/prot.20787.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

关于复杂结构注释对二硫键连接模式预测的相关性。

On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献