使用支持向量机和隐马尔可夫模型预测核蛋白。

Prediction of nuclear proteins using SVM and HMM models.

作者信息

Kumar Manish, Raghava Gajendra P S

机构信息

Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, India.

出版信息

BMC Bioinformatics. 2009 Jan 19;10:22. doi: 10.1186/1471-2105-10-22.

DOI:10.1186/1471-2105-10-22

PMID:19152693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2632991/

Abstract

BACKGROUND

The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.

RESULTS

All modules were trained and tested on a non-redundant dataset and evaluated using five-fold cross-validation technique. Firstly, Support Vector Machines (SVM) based modules have been developed using amino acid and dipeptide compositions and achieved a Mathews correlation coefficient (MCC) of 0.59 and 0.61 respectively. Secondly, we have developed SVM modules using split amino acid compositions (SAAC) and achieved the maximum MCC of 0.66. Thirdly, a hidden Markov model (HMM) based module/profile was developed for searching exclusively nuclear and non-nuclear domains in a protein. Finally, a hybrid module was developed by combining SVM module and HMM profile and achieved a MCC of 0.87 with an accuracy of 94.61%. This method performs better than the existing methods when evaluated on blind/independent datasets. Our method estimated 31.51%, 21.89%, 26.31%, 25.72% and 24.95% of the proteins as nuclear proteins in Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, mouse and human proteomes respectively. Based on the above modules, we have developed a web server NpPred for predicting nuclear proteins http://www.imtech.res.in/raghava/nppred/.

CONCLUSION

This study describes a highly accurate method for predicting nuclear proteins. SVM module has been developed for the first time using SAAC for predicting nuclear proteins, where amino acid composition of N-terminus and the remaining protein were computed separately. In addition, our study is a first documentation where exclusively nuclear and non-nuclear domains have been identified and used for predicting nuclear proteins. The performance of the method improved further by combining both approaches together.

摘要

背景

细胞核是一种高度组织化的细胞器，在细胞内稳态中发挥着重要作用。核蛋白对于染色体的维持/分离、基因表达、RNA加工/输出以及许多其他过程至关重要。过去已经开发了几种预测核蛋白的方法。本研究的目的是开发一种预测准确率更高的核蛋白预测新方法。

结果

所有模块均在一个非冗余数据集上进行训练和测试，并使用五折交叉验证技术进行评估。首先，利用氨基酸和二肽组成开发了基于支持向量机（SVM）的模块，其马修斯相关系数（MCC）分别达到0.59和0.61。其次，我们利用拆分氨基酸组成（SAAC）开发了SVM模块，最高MCC达到0.66。第三，开发了一种基于隐马尔可夫模型（HMM）的模块/轮廓，用于专门搜索蛋白质中的核域和非核域。最后，通过将SVM模块和HMM轮廓相结合开发了一个混合模块，其MCC达到0.87，准确率为94.61%。在对盲/独立数据集进行评估时，该方法的表现优于现有方法。我们的方法分别估计酿酒酵母、秀丽隐杆线虫、黑腹果蝇、小鼠和人类蛋白质组中31.51%、21.89%、26.31%、25.72%和24.95%的蛋白质为核蛋白。基于上述模块，我们开发了一个用于预测核蛋白的网络服务器NpPred，网址为http://www.imtech.res.in/raghava/nppred/。

结论

本研究描述了一种预测核蛋白的高精度方法。首次利用SAAC开发了用于预测核蛋白的SVM模块，其中分别计算了N端的氨基酸组成和其余蛋白质的氨基酸组成。此外，我们的研究首次记录了专门识别核域和非核域并将其用于预测核蛋白的情况。将两种方法结合在一起进一步提高了该方法的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/607f/2632991/7651d1f5630b/1471-2105-10-22-1.jpg

相似文献

Prediction of nuclear proteins using SVM and HMM models.

BMC Bioinformatics. 2009 Jan 19;10:22. doi: 10.1186/1471-2105-10-22.

Predicting sub-cellular localization of tRNA synthetases from their primary structures.

Amino Acids. 2012 May;42(5):1703-13. doi: 10.1007/s00726-011-0872-8. Epub 2011 Mar 13.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

In Silico Biol. 2008;8(2):129-40.

Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile.

Amino Acids. 2010 Jun;39(1):101-10. doi: 10.1007/s00726-009-0381-1. Epub 2009 Nov 12.

BTXpred: prediction of bacterial toxins.

In Silico Biol. 2007;7(4-5):405-12.

Oxypred: prediction and classification of oxygen-binding proteins.

Genomics Proteomics Bioinformatics. 2007 Dec;5(3-4):250-2. doi: 10.1016/S1672-0229(08)60012-1.

Support vector machine based prediction of glutathione S-transferase proteins.

Protein Pept Lett. 2007;14(6):575-80. doi: 10.2174/092986607780990046.

Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

DPROT: prediction of disordered proteins using evolutionary information.

Amino Acids. 2008 Oct;35(3):599-605. doi: 10.1007/s00726-008-0085-y. Epub 2008 Apr 19.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.

引用本文的文献

Bird Eye View of Protein Subcellular Localization Prediction.

Life (Basel). 2020 Dec 14;10(12):347. doi: 10.3390/life10120347.

To Decipher the Proteins Targeting into the Endoplasmic Reticulum and Their Implications in Prostate Cancer Etiology Using Next-Generation Sequencing Data.

Molecules. 2018 Apr 24;23(5):994. doi: 10.3390/molecules23050994.

Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine.

PeerJ. 2017 Sep 4;5:e3561. doi: 10.7717/peerj.3561. eCollection 2017.

Computational prediction of Mycoplasma hominis proteins targeting in nucleus of host cell and their implication in prostate cancer etiology.

Tumour Biol. 2016 Aug;37(8):10805-13. doi: 10.1007/s13277-016-4970-9. Epub 2016 Feb 13.

NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families.

Sci Rep. 2014 Oct 29;4:6810. doi: 10.1038/srep06810.

Protein sub-nuclear localization prediction using SVM and Pfam domain information.

PLoS One. 2014 Jun 4;9(6):e98345. doi: 10.1371/journal.pone.0098345. eCollection 2014.

Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence.

PLoS One. 2013 Apr 15;8(4):e61437. doi: 10.1371/journal.pone.0061437. Print 2013.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

The Nucleocapsid Protein of Potato Yellow dwarf Virus: Protein Interactions and Nuclear Import Mediated by a Non-Canonical Nuclear Localization Signal.

Front Plant Sci. 2012 Feb 2;3:14. doi: 10.3389/fpls.2012.00014. eCollection 2012.

The pancreatic beta cell surface proteome.

Diabetologia. 2012 Jul;55(7):1877-89. doi: 10.1007/s00125-012-2531-3. Epub 2012 Mar 31.

本文引用的文献

Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

NucPred--predicting nuclear localization of proteins.

Bioinformatics. 2007 May 1;23(9):1159-60. doi: 10.1093/bioinformatics/btm066. Epub 2007 Mar 1.

BaCelLo: a balanced subcellular localization predictor.

Bioinformatics. 2006 Jul 15;22(14):e408-16. doi: 10.1093/bioinformatics/btl222.

Pfam: clans, web tools and services.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.

Prediction of mitochondrial proteins using support vector machine and hidden Markov model.

J Biol Chem. 2006 Mar 3;281(9):5357-63. doi: 10.1074/jbc.M511061200. Epub 2005 Dec 8.

LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST.

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W105-10. doi: 10.1093/nar/gki359.

Mimicking cellular sorting improves prediction of subcellular localization.

J Mol Biol. 2005 Apr 22;348(1):85-100. doi: 10.1016/j.jmb.2005.02.025.

PSLpred: prediction of subcellular localization of bacterial proteins.

Bioinformatics. 2005 May 15;21(10):2522-4. doi: 10.1093/bioinformatics/bti309. Epub 2005 Feb 4.

Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search.

J Biol Chem. 2005 Apr 15;280(15):14427-32. doi: 10.1074/jbc.M411789200. Epub 2005 Jan 12.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

使用支持向量机和隐马尔可夫模型预测核蛋白。

Prediction of nuclear proteins using SVM and HMM models.

作者信息

机构信息