Tillier Elisabeth R M, Lui Thomas W H
Ontario Cancer Institute, University Health Network, Suite 703, 620 University Avenue, Toronto, Ontario, Canada M5G 2M9.
Bioinformatics. 2003 Apr 12;19(6):750-5. doi: 10.1093/bioinformatics/btg072.
Multiple sequence alignments of homologous proteins are useful for inferring their phylogenetic history and to reveal functionally important regions in the proteins. Functional constraints may lead to co-variation of two or more amino acids in the sequence, such that a substitution at one site is accompanied by compensatory substitutions at another site. It is not sufficient to find the statistical correlations between sites in the alignment because these may be the result of several undetermined causes. In particular, phylogenetic clustering will lead to many strong correlations.
A procedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence. Our method relies upon the accuracy of the alignment but it does not require any assumptions about the phylogeny or the substitution process. The effectiveness of the method was verified using computer simulations and then applied to predict functional interactions between amino acids in the Pfam database of alignments.
同源蛋白质的多序列比对对于推断其系统发育历史以及揭示蛋白质中功能重要区域很有用。功能限制可能导致序列中两个或多个氨基酸的共变,使得一个位点的替换伴随着另一个位点的补偿性替换。仅找到比对中位点之间的统计相关性是不够的,因为这些可能是由几个未确定的原因导致的。特别是,系统发育聚类会导致许多强相关性。
通过去除导致序列中每个位点与许多其他位点相关的强系统发育信号,开发了一种程序来检测源于功能相互作用的统计相关性。我们的方法依赖于比对的准确性,但不需要对系统发育或替换过程做任何假设。该方法的有效性通过计算机模拟得到验证,然后应用于预测比对的Pfam数据库中氨基酸之间的功能相互作用。