Compugen Ltd, 72 Pinchas Rosen, Tel Aviv 69512, Israel.
Protein Eng Des Sel. 2010 May;23(5):321-6. doi: 10.1093/protein/gzp078. Epub 2010 Jan 12.
Correlated mutation analysis (CMA) is a sequence-based approach for ab initio protein contact map prediction. The basis of this approach is the observed correlation between mutations in interacting amino acid residues. These correlations are often estimated by either calculating the Pearson's correlation coefficient (PCC) or the mutual information (MI) between columns in a multiple sequence alignment (MSA) of the protein of interest and its homologs. A major challenge of CMA is to filter out the background noise originating from phylogenetic relatedness between sequences included in the MSA. Recently, a procedure to reduce this background noise was demonstrated to improve an MI-based predictor. Herein, we tested whether a similar approach can also improve the performance of the classical PCC-based method. Indeed, performance improvements were achieved for all four major SCOP classes. Furthermore, the results reveal that the improved PCC-based method is superior to MI-based methods for proteins having MSAs of up to 100 sequences.
相关突变分析(CMA)是一种基于序列的从头蛋白质接触图预测方法。该方法的基础是观察到相互作用氨基酸残基突变之间的相关性。这些相关性通常通过计算蛋白质及其同源物的多重序列比对(MSA)中列之间的 Pearson 相关系数(PCC)或互信息(MI)来估计。CMA 的主要挑战是滤除来自 MSA 中包含的序列之间系统发育相关性的背景噪声。最近,已经证明一种减少这种背景噪声的程序可以提高基于 MI 的预测器的性能。在此,我们测试了类似的方法是否也可以提高基于经典 PCC 的方法的性能。事实上,对于所有四个主要的 SCOP 类,性能都得到了提高。此外,结果表明,对于 MSA 多达 100 个序列的蛋白质,改进的基于 PCC 的方法优于基于 MI 的方法。