基于最小-最大模块化支持向量机的蛋白质亚细胞多定位预测

Protein subcellular multi-localization prediction using a min-max modular support vector machine.

机构信息

Department of Computer Science and Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, China.

出版信息

Int J Neural Syst. 2010 Feb;20(1):13-28. doi: 10.1142/S0129065710002206.

DOI:10.1142/S0129065710002206

PMID:20180250

Abstract

Prediction of protein subcellular localization is an important issue in computational biology because it provides important clues for the characterization of protein functions. Currently, much research has been dedicated to developing automatic prediction tools. Most, however, focus on mono-locational proteins, i.e., they assume that proteins exist in only one location. It should be noted that many proteins bear multi-locational characteristics and carry out crucial functions in biological processes. This work aims to develop a general pattern classifier for predicting multiple subcellular locations of proteins. We use an ensemble classifier, called the min-max modular support vector machine (M(3)-SVM), to solve protein subcellular multi-localization problems; and, propose a module decomposition method based on gene ontology (GO) semantic information for M(3)-SVM. The amino acid composition with secondary structure and solvent accessibility information is adopted to represent features of protein sequences. We apply our method to two multi-locational protein data sets. The M(3)-SVMs show higher accuracy and efficiency than traditional SVMs using the same feature vectors. And the GO decomposition also helps to improve prediction accuracy. Moreover, our method has a much higher rate of accuracy than existing subcellular localization predictors in predicting protein multi-localization.

摘要

蛋白质亚细胞定位预测是计算生物学中的一个重要问题，因为它为蛋白质功能的特征描述提供了重要线索。目前，已经有大量的研究致力于开发自动预测工具。然而，大多数研究都集中在单定位蛋白质上，也就是说，它们假设蛋白质只存在于一个位置。需要注意的是，许多蛋白质具有多定位特征，并在生物过程中发挥着关键作用。本工作旨在开发一种用于预测蛋白质多种亚细胞位置的通用模式分类器。我们使用一种称为最小-最大模块化支持向量机（M(3)-SVM）的集成分类器来解决蛋白质亚细胞多定位问题；并提出了一种基于基因本体（GO）语义信息的 M(3)-SVM 模块分解方法。采用氨基酸组成、二级结构和溶剂可及性信息来表示蛋白质序列的特征。我们将该方法应用于两个多定位蛋白质数据集。与使用相同特征向量的传统 SVM 相比，M(3)-SVM 具有更高的准确性和效率。此外，GO 分解还有助于提高预测精度。而且，与现有的亚细胞定位预测器相比，我们的方法在预测蛋白质多定位方面具有更高的准确率。

相似文献

Protein subcellular multi-localization prediction using a min-max modular support vector machine.基于最小-最大模块化支持向量机的蛋白质亚细胞多定位预测

Int J Neural Syst. 2010 Feb;20(1):13-28. doi: 10.1142/S0129065710002206.

ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO：利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。

BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc：一种用于预测人类蛋白质亚细胞定位的新型集成分类器。

Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.

Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution.利用伪氨基酸组成预测蛋白质亚细胞定位：基于氨基酸组成分布的方法。

Amino Acids. 2008 Aug;35(2):321-7. doi: 10.1007/s00726-007-0623-z. Epub 2008 Jan 22.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine.通过双层支持向量机改进凋亡蛋白亚细胞定位的预测

Amino Acids. 2008 Aug;35(2):383-8. doi: 10.1007/s00726-007-0608-y. Epub 2007 Dec 21.

Two multi-classification strategies used on SVM to predict protein structural classes by using auto covariance.两种使用自协方差的 SVM 多分类策略用于预测蛋白质结构类别。

Interdiscip Sci. 2009 Dec;1(4):315-9. doi: 10.1007/s12539-009-0066-1. Epub 2009 Nov 14.

Prediction of protein subcellular multi-localization based on the general form of Chou's pseudo amino acid composition.基于周氏伪氨基酸组成通用形式的蛋白质亚细胞多定位预测

Protein Pept Lett. 2012 Apr;19(4):375-87. doi: 10.2174/092986612799789369.

Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。

BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.

引用本文的文献

ProtPlat: an efficient pre-training platform for protein classification based on FastText.ProtPlat：基于 FastText 的高效蛋白质分类预训练平台。

BMC Bioinformatics. 2022 Feb 11;23(1):66. doi: 10.1186/s12859-022-04604-2.

Improving Protein Subcellular Location Classification by Incorporating Three-Dimensional Structure Information.通过整合三维结构信息来改进蛋白质亚细胞定位分类。

Biomolecules. 2021 Oct 29;11(11):1607. doi: 10.3390/biom11111607.

Discrimination of schizophrenia auditory hallucinators by machine learning of resting-state functional MRI.通过静息态功能磁共振成像的机器学习对精神分裂症幻听者进行鉴别

Int J Neural Syst. 2015 May;25(3):1550007. doi: 10.1142/S0129065715500070. Epub 2015 Jan 19.

HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc：在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。

PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于最小-最大模块化支持向量机的蛋白质亚细胞多定位预测

Protein subcellular multi-localization prediction using a min-max modular support vector machine.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献