通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。

Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.

出版信息

IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.

DOI:10.1109/TNB.2012.2208473

Abstract

Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.

摘要

膜蛋白由基因组中约 30%的编码，在生物体内发挥着重要作用。先前的研究表明，膜蛋白的结构和功能表现出明显的细胞器特异性。因此，考虑到膜蛋白湿实验研究的极端困难，从一级序列预测膜蛋白的亚细胞位置是非常需要的。尽管已经开发了许多用于预测蛋白质亚细胞位置的模型，但只有少数是专门针对膜蛋白的。现有的预测方法是基于统计机器学习算法构建的，这些算法串行组合了多视图特征，即不同的特征向量简单地串行组合成一个超级特征向量。然而，这种特征的简单组合将同时增加信息冗余，从而可能降低最终的预测精度。这就是为什么在串行超级空间中发现预测成功率甚至低于单视图空间的原因。本文的目的是研究一种合适的方法，用于融合多个多视图蛋白质序列特征进行亚细胞位置预测。我们提出了一种新的并行框架，用于融合多个膜蛋白多视图属性，以便在复杂空间中表示蛋白质样本。我们还提出了广义主成分分析（GPCA）用于特征降维目的在复杂的几何形状。通过在基准膜蛋白亚细胞定位数据集上使用不同的机器学习算法进行的所有实验结果表明，新提出的并行策略优于传统的串行方法。我们还在可溶性蛋白质亚细胞定位数据集上证明了并行策略的有效性，表明并行技术灵活适用于其他计算生物学问题。软件和数据集可在：http://www.csbio.sjtu.edu.cn/bioinf/mpsp。

相似文献

Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。

IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.

Learning protein multi-view features in complex space.在复杂空间中学习蛋白质多视图特征。

Amino Acids. 2013 May;44(5):1365-79. doi: 10.1007/s00726-013-1472-6. Epub 2013 Feb 28.

Going from where to why--interpretable prediction of protein subcellular localization.从何处到为何——蛋白质亚细胞定位的可解释预测。

Bioinformatics. 2010 May 1;26(9):1232-8. doi: 10.1093/bioinformatics/btq115. Epub 2010 Mar 17.

Prediction of protein subcellular localization.蛋白质亚细胞定位预测

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0：通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测

Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.

SubCellProt: predicting protein subcellular localization using machine learning approaches.SubCellProt：使用机器学习方法预测蛋白质亚细胞定位。

In Silico Biol. 2009;9(1-2):35-44.

CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition.CE-PLoc：一种通过融合不同模式的伪氨基酸组成来预测蛋白质亚细胞位置的集成分类器。

Comput Biol Chem. 2011 Aug 10;35(4):218-29. doi: 10.1016/j.compbiolchem.2011.05.003. Epub 2011 May 27.

Subcellular localization prediction for human internal and organelle membrane proteins with projected gene ontology scores.使用预测的基因本体评分对人类细胞内和细胞器膜蛋白进行亚细胞定位预测。

J Theor Biol. 2012 Nov 21;313:61-7. doi: 10.1016/j.jtbi.2012.08.016. Epub 2012 Aug 23.

Multilabel learning for protein subcellular location prediction.多标签学习在蛋白质亚细胞定位预测中的应用。

IEEE Trans Nanobioscience. 2012 Sep;11(3):237-43. doi: 10.1109/TNB.2012.2212249.

SOMPNN: an efficient non-parametric model for predicting transmembrane helices.SOMPNN：一种用于预测跨膜螺旋的高效非参数模型。

Amino Acids. 2012 Jun;42(6):2195-205. doi: 10.1007/s00726-011-0959-2. Epub 2011 Jun 22.

引用本文的文献

A Review for Artificial Intelligence Based Protein Subcellular Localization.基于人工智能的蛋白质亚细胞定位研究综述

Biomolecules. 2024 Mar 27;14(4):409. doi: 10.3390/biom14040409.

Identification and Expression Pattern of EZH2 in Pig Developing Fetuses.鉴定和表达模式 EZH2 在猪胎儿发育。

Biomed Res Int. 2020 Oct 5;2020:5315930. doi: 10.1155/2020/5315930. eCollection 2020.

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.基于多视图特征融合的蛋白质亚细胞定位预测。

Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.

A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data.基于低秩表示的自训练子空间聚类算法在基因表达数据癌症分类中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1315-1324. doi: 10.1109/TCBB.2017.2712607. Epub 2017 Jun 6.

A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.一种新型的特征提取方法，具有特征选择功能，可从不平衡数据中识别出高尔基驻留蛋白类型。

Int J Mol Sci. 2016 Feb 6;17(2):218. doi: 10.3390/ijms17020218.

TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.TargetFreeze：通过结合使用序列进化信息和伪氨基酸组成的权重来鉴定抗冻蛋白

J Membr Biol. 2015 Dec;248(6):1005-14. doi: 10.1007/s00232-015-9811-z. Epub 2015 Jun 10.

Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble.通过多种异质子空间 SVM 集成来增强蛋白质-维生素结合残基预测。

BMC Bioinformatics. 2014 Sep 5;15(1):297. doi: 10.1186/1471-2105-15-297.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。

Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献