Suppr超能文献

利用氨基酸组成和氨基酸对,通过支持向量机预测蛋白质亚细胞定位。

Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.

作者信息

Park Keun-Joon, Kanehisa Minoru

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.

出版信息

Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.

Abstract

MOTIVATION

The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs).

RESULTS

We considered 12 subcellular locations in eukaryotic cells: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracellular medium, Golgi apparatus, lysosome, mitochondrion, nucleus, peroxisome, plasma membrane, and vacuole. We constructed a data set of proteins with known locations from the SWISS-PROT database. A set of SVMs was trained to predict the subcellular location of a given protein based on its amino acid, amino acid pair, and gapped amino acid pair compositions. The predictors based on these different compositions were then combined using a voting scheme. Results obtained through 5-fold cross-validation tests showed an improvement in prediction accuracy over the algorithm based on the amino acid composition only. This prediction method is available via the Internet.

摘要

动机

蛋白质的亚细胞定位与其功能密切相关。因此,根据氨基酸序列信息对亚细胞定位进行计算预测将有助于对完整基因组中蛋白质编码基因进行注释和功能预测。我们开发了一种基于支持向量机(SVM)的方法。

结果

我们考虑了真核细胞中的12个亚细胞定位:叶绿体、细胞质、细胞骨架、内质网、细胞外介质、高尔基体、溶酶体、线粒体、细胞核、过氧化物酶体、质膜和液泡。我们从SWISS-PROT数据库构建了一个具有已知定位的蛋白质数据集。训练了一组支持向量机,以根据给定蛋白质的氨基酸、氨基酸对和带间隔的氨基酸对组成来预测其亚细胞定位。然后使用投票方案将基于这些不同组成的预测器进行组合。通过5折交叉验证测试获得的结果表明,与仅基于氨基酸组成的算法相比,预测准确性有所提高。这种预测方法可通过互联网获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验