College of Engineering, Shanghai Ocean University, Shanghai 201303, China.
Comput Biol Med. 2011 Aug;41(8):640-7. doi: 10.1016/j.compbiomed.2011.05.015. Epub 2011 Jun 12.
Protein remote homology detection is a critical step toward annotating its structure and function. Supervised learning algorithms such as support vector machine are currently the most accurate methods. The position-specific score matrices (PSSMs) contain wealthy information about the evolutionary relationship of proteins. However, the PSSMs often have different lengths, which are difficult to be used by machine-learning methods. In this study, a simple, fast and powerful method is presented for protein remote homology detection, which combines support vector machine with auto-cross covariance transformation. The PSSMs are converted into a series of fixed-length vectors by auto-cross covariance transformation and these vectors are then input to a support vector machine classifier for remote homology detection. The sequence-order effects can be effectively captured by this scheme. Experiments are performed on well-established datasets, and the remote homology is simulated at the superfamily and the fold level, respectively. The results show that the proposed method, referred to as ACCRe, is comparable or even better than the state-of-the-art methods in terms of detection performance, and its time complexity is superior to those of other profile-based SVM methods. The auto-cross covariance transformation provides a novel way for the usage of evolutionary information, which can be widely used for protein-level studies.
蛋白质远程同源性检测是注释其结构和功能的关键步骤。支持向量机等监督学习算法是目前最准确的方法。位置特异性评分矩阵(PSSMs)包含有关蛋白质进化关系的丰富信息。然而,PSSMs 通常具有不同的长度,这使得机器学习方法难以使用。在这项研究中,提出了一种简单、快速和强大的蛋白质远程同源性检测方法,该方法将支持向量机与自交叉协方差变换相结合。自交叉协方差变换将 PSSMs 转换为一系列固定长度的向量,然后将这些向量输入支持向量机分类器进行远程同源性检测。该方案可以有效地捕获序列顺序效应。在成熟的数据集上进行了实验,分别在超家族和折叠水平上模拟了远程同源性。结果表明,所提出的方法(简称 ACCRe)在检测性能方面可与最先进的方法相媲美,甚至更好,并且其时间复杂度优于其他基于轮廓的 SVM 方法。自交叉协方差变换为进化信息的使用提供了一种新方法,可广泛用于蛋白质水平的研究。