School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Amino Acids. 2013 May;44(5):1365-79. doi: 10.1007/s00726-013-1472-6. Epub 2013 Feb 28.
Protein attribute prediction from primary sequences is an important task and how to extract discriminative features is one of the most crucial aspects. Because single-view feature cannot reflect all the information of a protein, fusing multi-view features is considered as a promising route to improve prediction accuracy. In this paper, we propose a novel framework for protein multi-view feature fusion: first, features from different views are parallely combined to form complex feature vectors; Then, we extend the classic principal component analysis to the generalized principle component analysis for further feature extraction from the parallely combined complex features, which lie in a complex space. Finally, the extracted features are used for prediction. Experimental results on different benchmark datasets and machine learning algorithms demonstrate that parallel strategy outperforms the traditional serial approach and is particularly helpful for extracting the core information buried among multi-view feature sets. A web server for protein structural class prediction based on the proposed method (COMSPA) is freely available for academic use at: http://www.csbio.sjtu.edu.cn/bioinf/COMSPA/ .
从原始序列预测蛋白质属性是一项重要的任务,如何提取判别特征是最关键的方面之一。由于单视图特征不能反映蛋白质的所有信息,因此融合多视图特征被认为是提高预测准确性的一种有前途的途径。在本文中,我们提出了一种用于蛋白质多视图特征融合的新框架:首先,来自不同视图的特征被并行组合以形成复杂的特征向量;然后,我们将经典的主成分分析扩展到广义主成分分析,以从位于复空间中的并行组合的复杂特征中进一步提取特征。最后,提取的特征用于预测。在不同的基准数据集和机器学习算法上的实验结果表明,并行策略优于传统的串行方法,特别有助于从多视图特征集中提取隐藏的核心信息。基于所提出的方法(COMSPA)的蛋白质结构类预测的网络服务器可免费用于学术用途:http://www.csbio.sjtu.edu.cn/bioinf/COMSPA/ 。