Suppr超能文献

基于最小-最大模块化支持向量机的蛋白质亚细胞多定位预测

Protein subcellular multi-localization prediction using a min-max modular support vector machine.

机构信息

Department of Computer Science and Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, China.

出版信息

Int J Neural Syst. 2010 Feb;20(1):13-28. doi: 10.1142/S0129065710002206.

Abstract

Prediction of protein subcellular localization is an important issue in computational biology because it provides important clues for the characterization of protein functions. Currently, much research has been dedicated to developing automatic prediction tools. Most, however, focus on mono-locational proteins, i.e., they assume that proteins exist in only one location. It should be noted that many proteins bear multi-locational characteristics and carry out crucial functions in biological processes. This work aims to develop a general pattern classifier for predicting multiple subcellular locations of proteins. We use an ensemble classifier, called the min-max modular support vector machine (M(3)-SVM), to solve protein subcellular multi-localization problems; and, propose a module decomposition method based on gene ontology (GO) semantic information for M(3)-SVM. The amino acid composition with secondary structure and solvent accessibility information is adopted to represent features of protein sequences. We apply our method to two multi-locational protein data sets. The M(3)-SVMs show higher accuracy and efficiency than traditional SVMs using the same feature vectors. And the GO decomposition also helps to improve prediction accuracy. Moreover, our method has a much higher rate of accuracy than existing subcellular localization predictors in predicting protein multi-localization.

摘要

蛋白质亚细胞定位预测是计算生物学中的一个重要问题,因为它为蛋白质功能的特征描述提供了重要线索。目前,已经有大量的研究致力于开发自动预测工具。然而,大多数研究都集中在单定位蛋白质上,也就是说,它们假设蛋白质只存在于一个位置。需要注意的是,许多蛋白质具有多定位特征,并在生物过程中发挥着关键作用。本工作旨在开发一种用于预测蛋白质多种亚细胞位置的通用模式分类器。我们使用一种称为最小-最大模块化支持向量机(M(3)-SVM)的集成分类器来解决蛋白质亚细胞多定位问题;并提出了一种基于基因本体(GO)语义信息的 M(3)-SVM 模块分解方法。采用氨基酸组成、二级结构和溶剂可及性信息来表示蛋白质序列的特征。我们将该方法应用于两个多定位蛋白质数据集。与使用相同特征向量的传统 SVM 相比,M(3)-SVM 具有更高的准确性和效率。此外,GO 分解还有助于提高预测精度。而且,与现有的亚细胞定位预测器相比,我们的方法在预测蛋白质多定位方面具有更高的准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验