一种使用支持向量机预测亚细胞定位的蛋白质序列新表示法。

A novel representation of protein sequences for prediction of subcellular location using support vector machines.

作者信息

Matsuda Setsuro, Vert Jean-Philippe, Saigo Hiroto, Ueda Nobuhisa, Toh Hiroyuki, Akutsu Tatsuya

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.

出版信息

Protein Sci. 2005 Nov;14(11):2804-13. doi: 10.1110/ps.051597405.

DOI:10.1110/ps.051597405

PMID:16251364

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2253224/

Abstract

As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.

摘要

随着完整基因组数量的迅速增加，能够自动预测蛋白质亚细胞定位的精确方法对于帮助进行蛋白质功能注释变得越来越有用。为了提高迄今为止开发的众多预测方法的预测准确性，本文提出了一种蛋白质序列的新颖表示方法。这种表示方法涉及氨基酸和双氨基酸的局部组成，以及连续（碱性、疏水性和其他）氨基酸之间距离的局部频率。为了计算局部特征，每个序列被分为三个部分：N端、中间和C端。N端部分进一步分为四个区域，以考虑信号序列长度和位置的不确定性。我们使用支持向量机在从SWISS-PROT数据库提取的两个数据集上测试了这种表示方法。通过五折交叉验证测试，真核生物和原核生物蛋白质的总体准确率分别超过了87%和91%。研究得出结论，考虑N端、中间和C端部分的各自特征有助于预测亚细胞定位。

相似文献

A novel representation of protein sequences for prediction of subcellular location using support vector machines.

Protein Sci. 2005 Nov;14(11):2804-13. doi: 10.1110/ps.051597405.

Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.

Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.

Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

BMC Bioinformatics. 2007 Nov 30;8:466. doi: 10.1186/1471-2105-8-466.

A novel representation for apoptosis protein subcellular localization prediction using support vector machine.

J Theor Biol. 2009 Jul 21;259(2):361-5. doi: 10.1016/j.jtbi.2009.03.025. Epub 2009 Mar 27.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.

Signal peptide discrimination and cleavage site identification using SVM and NN.

Comput Biol Med. 2014 Feb;45:98-110. doi: 10.1016/j.compbiomed.2013.11.017. Epub 2013 Dec 1.

Subcellular localization prediction with new protein encoding schemes.

IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):227-32. doi: 10.1109/TCBB.2007.070209.

Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach.

Protein Pept Lett. 2011 May;18(5):480-7. doi: 10.2174/092986611794927947.

Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis.

Plant Physiol. 2010 Sep;154(1):36-54. doi: 10.1104/pp.110.156851. Epub 2010 Jul 20.

Computational differentiation of N-terminal signal peptides and transmembrane helices.

Biochem Biophys Res Commun. 2003 Dec 26;312(4):1278-83. doi: 10.1016/j.bbrc.2003.11.069.

引用本文的文献

Peptide based vaccine designing against endemic causing mammarenavirus using reverse vaccinology approach.

Arch Microbiol. 2024 Apr 15;206(5):217. doi: 10.1007/s00203-024-03942-4.

Subcellular Proteomics as a Unified Approach of Experimental Localizations and Computed Prediction Data for Arabidopsis and Crop Plants.

Adv Exp Med Biol. 2021;1346:67-89. doi: 10.1007/978-3-030-80352-0_4.

Aldehyde Dehydrogenase 3 Is an Expanded Gene Family with Potential Adaptive Roles in Chickpea.

Plants (Basel). 2021 Nov 10;10(11):2429. doi: 10.3390/plants10112429.

Auxin Metabolome Profiling in the Arabidopsis Endoplasmic Reticulum Using an Optimised Organelle Isolation Protocol.

Int J Mol Sci. 2021 Aug 29;22(17):9370. doi: 10.3390/ijms22179370.

ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence.

PeerJ. 2020 Apr 29;8:e9066. doi: 10.7717/peerj.9066. eCollection 2020.

PSO-LocBact: A Consensus Method for Optimizing Multiple Classifier Results for Predicting the Subcellular Localization of Bacterial Proteins.

Biomed Res Int. 2019 Nov 19;2019:5617153. doi: 10.1155/2019/5617153. eCollection 2019.

Identification of a Ribosomal Protein RpsB as a Surface-Exposed Protein and Adhesin of .

Biomed Res Int. 2019 Jul 9;2019:9297129. doi: 10.1155/2019/9297129. eCollection 2019.

Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.

BioData Min. 2019 Mar 4;12:7. doi: 10.1186/s13040-019-0196-x. eCollection 2019.

Identifying anticancer peptides by using a generalized chaos game representation.

J Math Biol. 2019 Jan;78(1-2):441-463. doi: 10.1007/s00285-018-1279-x. Epub 2018 Oct 5.

The meiotic regulator JASON utilizes alternative translation initiation sites to produce differentially localized forms.

J Exp Bot. 2017 Jul 10;68(15):4205-4217. doi: 10.1093/jxb/erx222.

本文引用的文献

Multi-class support vector machines for protein secondary structure prediction.

Genome Inform. 2003;14:218-27.

PSLpred: prediction of subcellular localization of bacterial proteins.

Bioinformatics. 2005 May 15;21(10):2522-4. doi: 10.1093/bioinformatics/bti309. Epub 2005 Feb 4.

Improved prediction of signal peptides: SignalP 3.0.

J Mol Biol. 2004 Jul 16;340(4):783-95. doi: 10.1016/j.jmb.2004.05.028.

ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.

Prediction of the subcellular localization of eukaryotic proteins using sequence signals and composition.

Proteomics. 2004 Jun;4(6):1591-6. doi: 10.1002/pmic.200300769.

Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition.

J Cell Biochem. 2004 Apr 15;91(6):1197-203. doi: 10.1002/jcb.10790.

Predicting subcellular localization of proteins in a hybridization space.

Bioinformatics. 2004 May 1;20(7):1151-6. doi: 10.1093/bioinformatics/bth054. Epub 2004 Feb 5.

A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology.

Biochem Biophys Res Commun. 2003 Nov 21;311(3):743-7. doi: 10.1016/j.bbrc.2003.10.062.

Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.

Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.

Secretion of FGF-16 requires an uncleaved bipartite signal sequence.

J Biol Chem. 2003 Sep 12;278(37):35718-24. doi: 10.1074/jbc.M300690200. Epub 2003 Jul 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种使用支持向量机预测亚细胞定位的蛋白质序列新表示法。

A novel representation of protein sequences for prediction of subcellular location using support vector machines.

作者信息

Matsuda Setsuro, Vert Jean-Philippe, Saigo Hiroto, Ueda Nobuhisa, Toh Hiroyuki, Akutsu Tatsuya

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.

出版信息

Protein Sci. 2005 Nov;14(11):2804-13. doi: 10.1110/ps.051597405.

DOI:10.1110/ps.051597405

PMID:16251364

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2253224/

Abstract

摘要

一种使用支持向量机预测亚细胞定位的蛋白质序列新表示法。

A novel representation of protein sequences for prediction of subcellular location using support vector machines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种使用支持向量机预测亚细胞定位的蛋白质序列新表示法。

A novel representation of protein sequences for prediction of subcellular location using support vector machines.

作者信息

机构信息

出版信息