Computer and Information Sciences Department, University of Delaware, 101 Smith Hall, Newark, DE 19716, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):992-1001. doi: 10.1109/TCBB.2011.136.
We present a new computational method for predicting ligand binding residues and functional sites in protein sequences. These residues and sites tend to be not only conserved, but also exhibit strong correlation due to the selection pressure during evolution in order to maintain the required structure and/or function. To explore the effect of correlations among multiple positions in the sequences, the method uses graph theoretic clustering and kernel-based canonical correlation analysis (kCCA) to identify binding and functional sites in protein sequences as the residues that exhibit strong correlation between the residues’ evolutionary characterization at the sites and the structure-based functional classification of the proteins in the context of a functional family. The results of testing the method on two well-curated data sets show that the prediction accuracy as measured by Receiver Operating Characteristic (ROC) scores improves significantly when multipositional correlations are accounted for.
我们提出了一种新的计算方法,用于预测蛋白质序列中的配体结合残基和功能位点。这些残基和位点不仅趋于保守,而且由于进化过程中的选择压力,它们之间还存在很强的相关性,以维持所需的结构和/或功能。为了探索序列中多个位置之间相关性的影响,该方法使用图论聚类和基于核的典型相关分析(kCCA)来识别蛋白质序列中的结合和功能位点,这些位点的残基在进化特征与蛋白质功能家族结构功能分类之间表现出很强的相关性。该方法在两个精心整理的数据集上的测试结果表明,当考虑多位置相关性时,基于接收器操作特征(ROC)分数的预测准确性会显著提高。