基于支持向量机（SVM）的多类预测及纤溶酶原激活剂的基本统计分析

Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.

作者信息

Muthukrishnan Selvaraj, Puri Munish, Lefevre Christophe

机构信息

Fermentation and Protein Biotechnology Laboratory, Department of Biotechnology, Punjabi University, Patiala, India, 2CSIR-IMTECH, Chandigarh, India.

出版信息

BMC Res Notes. 2014 Jan 27;7:63. doi: 10.1186/1756-0500-7-63.

DOI:10.1186/1756-0500-7-63

PMID:24468032

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3924408/

Abstract

BACKGROUND

Plasminogen (Pg), the precursor of the proteolytic and fibrinolytic enzyme of blood, is converted to the active enzyme plasmin (Pm) by different plasminogen activators (tissue plasminogen activators and urokinase), including the bacterial activators streptokinase and staphylokinase, which activate Pg to Pm and thus are used clinically for thrombolysis. The identification of Pg-activators is therefore an important step in understanding their functional mechanism and derives new therapies.

METHODS

In this study, different computational methods for predicting plasminogen activator peptide sequences with high accuracy were investigated, including support vector machines (SVM) based on amino acid (AC), dipeptide composition (DC), PSSM profile and Hybrid methods used to predict different Pg-activators from both prokaryotic and eukaryotic origins.

RESULTS

Overall maximum accuracy, evaluated using the five-fold cross validation technique, was 88.37%, 84.32%, 87.61%, 85.63% in 0.87, 0.83,0.86 and 0.85 MCC with amino (AC) or dipeptide composition (DC), PSSM profile and Hybrid methods respectively. Through this study, we have found that the different subfamilies of Pg-activators are quite closely correlated in terms of amino, dipeptide, PSSM and Hybrid compositions. Therefore, our prediction results show that plasminogen activators are predictable with a high accuracy from their primary sequence. Prediction performance was also cross-checked by confusion matrix and ROC (Receiver operating characteristics) analysis. A web server to facilitate the prediction of Pg-activators from primary sequence data was implemented.

CONCLUSION

The results show that dipeptide, PSSM profile, and Hybrid based methods perform better than single amino acid composition (AC). Furthermore, we also have developed a web server, which predicts the Pg-activators and their classification (available online at http://mamsap.it.deakin.edu.au/plas_pred/home.html). Our experimental results show that our approaches are faster and achieve generally a good prediction performance.

摘要

背景

纤溶酶原（Pg）是血液中蛋白水解和纤维蛋白溶解酶的前体，可被不同的纤溶酶原激活剂（组织纤溶酶原激活剂和尿激酶）转化为活性酶纤溶酶（Pm），包括细菌激活剂链激酶和葡萄球菌激酶，它们将Pg激活为Pm，因此在临床上用于溶栓。因此，鉴定Pg激活剂是理解其功能机制并开发新疗法的重要一步。

方法

在本研究中，研究了不同的高精度预测纤溶酶原激活剂肽序列的计算方法，包括基于氨基酸（AC）、二肽组成（DC）、位置特异性打分矩阵（PSSM）谱的支持向量机（SVM）以及用于预测来自原核和真核来源的不同Pg激活剂的混合方法。

结果

使用五折交叉验证技术评估的总体最大准确率，在使用氨基酸（AC）或二肽组成（DC）、PSSM谱和混合方法时，分别为88.37%、84.32%、87.61%、85.63%，马修斯相关系数（MCC）分别为0.87、0.83、0.86和0.85。通过这项研究，我们发现Pg激活剂的不同亚家族在氨基酸、二肽、PSSM和混合组成方面密切相关。因此，我们的预测结果表明，从其一级序列可以高精度预测纤溶酶原激活剂。预测性能还通过混淆矩阵和ROC（接收者操作特征）分析进行了交叉检验。实现了一个网络服务器，以方便从一级序列数据预测Pg激活剂。

结论

结果表明，基于二肽、PSSM谱和混合的方法比单一氨基酸组成（AC）表现更好。此外，我们还开发了一个网络服务器，用于预测Pg激活剂及其分类（可在http://mamsap.it.deakin.edu.au/plas_pred/home.html在线获取）。我们的实验结果表明，我们的方法更快，并且通常具有良好的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c17/3924408/f6d44b988885/1756-0500-7-63-1.jpg

相似文献

Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.

BMC Res Notes. 2014 Jan 27;7:63. doi: 10.1186/1756-0500-7-63.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

In Silico Biol. 2008;8(2):129-40.

Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile.

Amino Acids. 2010 Jun;39(1):101-10. doi: 10.1007/s00726-009-0381-1. Epub 2009 Nov 12.

Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information.

BMC Bioinformatics. 2010 Jun 3;11:301. doi: 10.1186/1471-2105-11-301.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.

J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.

Prediction of membrane transport proteins and their substrate specificities using primary sequence information.

PLoS One. 2014 Jun 26;9(6):e100278. doi: 10.1371/journal.pone.0100278. eCollection 2014.

Mechanism of action of omega-amino acids on plasminogen activation and fibrinolysis induced by staphylokinase.

Biochemistry (Mosc). 2007 Jul;72(7):707-15. doi: 10.1134/s0006297907070048.

引用本文的文献

Computational method for aromatase-related proteins using machine learning approach.

PLoS One. 2023 Mar 29;18(3):e0283567. doi: 10.1371/journal.pone.0283567. eCollection 2023.

Ion-pumping microbial rhodopsin protein classification by machine learning approach.

BMC Bioinformatics. 2023 Jan 27;24(1):29. doi: 10.1186/s12859-023-05138-x.

Distinguishing Glioblastoma Subtypes by Methylation Signatures.

Front Genet. 2020 Nov 24;11:604336. doi: 10.3389/fgene.2020.604336. eCollection 2020.

Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules.

BMC Res Notes. 2018 May 11;11(1):290. doi: 10.1186/s13104-018-3383-9.

Point-of-care testing in the early diagnosis of acute pesticide intoxication: The example of paraquat.

Biomicrofluidics. 2018 Jan 19;12(1):011501. doi: 10.1063/1.5003848. eCollection 2018 Jan.

BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins.

Adv Bioinformatics. 2016;2016:8150784. doi: 10.1155/2016/8150784. Epub 2016 Feb 29.

本文引用的文献

Analysis and prediction of cancerlectins using evolutionary and domain information.

BMC Res Notes. 2011 Jul 20;4:237. doi: 10.1186/1756-0500-4-237.

MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes.

BMC Res Notes. 2009 Apr 20;2:61. doi: 10.1186/1756-0500-2-61.

Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus.

J Virol. 2009 Jan;83(2):908-17. doi: 10.1128/JVI.01977-08. Epub 2008 Oct 29.

Oxypred: prediction and classification of oxygen-binding proteins.

Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):250-2. doi: 10.1016/S1672-0229(08)60012-1.

Prediction of RNA binding sites in a protein using SVM and PSSM profile.

Proteins. 2008 Apr;71(1):189-94. doi: 10.1002/prot.21677.

Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.

Prediction of mitochondrial proteins using support vector machine and hidden Markov model.

J Biol Chem. 2006 Mar 3;281(9):5357-63. doi: 10.1074/jbc.M511061200. Epub 2005 Dec 8.

Plasminogen activators: a comparison.

Vascul Pharmacol. 2006 Jan;44(1):1-9. doi: 10.1016/j.vph.2005.09.003. Epub 2005 Nov 7.

Structure and function of the plasminogen/plasmin system.

Thromb Haemost. 2005 Apr;93(4):647-54. doi: 10.1160/TH04-12-0842.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于支持向量机（SVM）的多类预测及纤溶酶原激活剂的基本统计分析

Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.

作者信息

Muthukrishnan Selvaraj, Puri Munish, Lefevre Christophe

机构信息

Fermentation and Protein Biotechnology Laboratory, Department of Biotechnology, Punjabi University, Patiala, India, 2CSIR-IMTECH, Chandigarh, India.