在复杂空间中学习蛋白质多视图特征。

Learning protein multi-view features in complex space.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

出版信息

Amino Acids. 2013 May;44(5):1365-79. doi: 10.1007/s00726-013-1472-6. Epub 2013 Feb 28.

Abstract

Protein attribute prediction from primary sequences is an important task and how to extract discriminative features is one of the most crucial aspects. Because single-view feature cannot reflect all the information of a protein, fusing multi-view features is considered as a promising route to improve prediction accuracy. In this paper, we propose a novel framework for protein multi-view feature fusion: first, features from different views are parallely combined to form complex feature vectors; Then, we extend the classic principal component analysis to the generalized principle component analysis for further feature extraction from the parallely combined complex features, which lie in a complex space. Finally, the extracted features are used for prediction. Experimental results on different benchmark datasets and machine learning algorithms demonstrate that parallel strategy outperforms the traditional serial approach and is particularly helpful for extracting the core information buried among multi-view feature sets. A web server for protein structural class prediction based on the proposed method (COMSPA) is freely available for academic use at: http://www.csbio.sjtu.edu.cn/bioinf/COMSPA/ .

摘要

从原始序列预测蛋白质属性是一项重要的任务，如何提取判别特征是最关键的方面之一。由于单视图特征不能反映蛋白质的所有信息，因此融合多视图特征被认为是提高预测准确性的一种有前途的途径。在本文中，我们提出了一种用于蛋白质多视图特征融合的新框架：首先，来自不同视图的特征被并行组合以形成复杂的特征向量；然后，我们将经典的主成分分析扩展到广义主成分分析，以从位于复空间中的并行组合的复杂特征中进一步提取特征。最后，提取的特征用于预测。在不同的基准数据集和机器学习算法上的实验结果表明，并行策略优于传统的串行方法，特别有助于从多视图特征集中提取隐藏的核心信息。基于所提出的方法（COMSPA）的蛋白质结构类预测的网络服务器可免费用于学术用途：http://www.csbio.sjtu.edu.cn/bioinf/COMSPA/ 。

相似文献

Learning protein multi-view features in complex space.在复杂空间中学习蛋白质多视图特征。

Amino Acids. 2013 May;44(5):1365-79. doi: 10.1007/s00726-013-1472-6. Epub 2013 Feb 28.

Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.通过多视角特征的并行融合提高膜蛋白亚细胞定位预测。

IEEE Trans Nanobioscience. 2012 Dec;11(4):375-85. doi: 10.1109/TNB.2012.2208473. Epub 2012 Aug 3.

Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0：通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测

Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.

Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features.基于潜在主题特征的从氨基酸序列大规模预测人类蛋白质-蛋白质相互作用。

J Proteome Res. 2010 Oct 1;9(10):4992-5001. doi: 10.1021/pr100618t.

TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.TargetCrys：通过融合多视图特征与双层支持向量机进行蛋白质结晶预测。

Amino Acids. 2016 Nov;48(11):2533-2547. doi: 10.1007/s00726-016-2274-4. Epub 2016 Jun 14.

HYPROSP II--a knowledge-based hybrid method for protein secondary structure prediction based on local prediction confidence.HYPROSP II——一种基于局部预测置信度的用于蛋白质二级结构预测的基于知识的混合方法。

Bioinformatics. 2005 Aug 1;21(15):3227-33. doi: 10.1093/bioinformatics/bti524. Epub 2005 Jun 2.

Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction.构建查询驱动的动态机器学习模型及其在蛋白质-配体结合位点预测中的应用。

IEEE Trans Nanobioscience. 2015 Jan;14(1):45-58. doi: 10.1109/TNB.2015.2394328.

RNA-binding protein recognition based on multi-view deep feature and multi-label learning.基于多视图深度特征和多标签学习的 RNA 结合蛋白识别。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa174.

SOMRuler: a novel interpretable transmembrane helices predictor.SOMRuler：一种新型可解释的跨膜螺旋预测器。

IEEE Trans Nanobioscience. 2011 Jun;10(2):121-9. doi: 10.1109/TNB.2011.2160730. Epub 2011 Jul 7.

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习：用于蛋白质折叠识别和远程同源性检测

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

引用本文的文献

PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles.PSSMCOOL：一个用于从PSSM谱生成基于进化的蛋白质序列描述符的综合R包。

Biol Methods Protoc. 2022 Mar 30;7(1):bpac008. doi: 10.1093/biomethods/bpac008. eCollection 2022.

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.基于机器学习的数据清理和后过滤程序预测蛋白质-蛋白质相互作用位点

J Membr Biol. 2016 Apr;249(1-2):141-53. doi: 10.1007/s00232-015-9856-z. Epub 2015 Nov 12.

TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.TargetFreeze：通过结合使用序列进化信息和伪氨基酸组成的权重来鉴定抗冻蛋白

J Membr Biol. 2015 Dec;248(6):1005-14. doi: 10.1007/s00232-015-9811-z. Epub 2015 Jun 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在复杂空间中学习蛋白质多视图特征。

Learning protein multi-view features in complex space.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献