Gueudré Thomas, Baldassi Carlo, Zamparo Marco, Weigt Martin, Pagnani Andrea
Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy.
Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy; Human Genetics Foundation, Molecular Biotechnology Center, 10126 Torino, Italy.
Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11.
Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue-residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue-residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.
理解蛋白质-蛋白质相互作用是我们理解几乎所有复杂生物过程的核心。迫切需要利用快速增长的基因组数据库来表征蛋白质-蛋白质相互作用的计算工具。此类方法应连接多个尺度,从同源蛋白质家族之间的进化保守相互作用,到物种内多个旁系同源物情况下特异性相互作用蛋白质的鉴定,再到跨相互作用界面处于物理接触的残基预测。检测残基-残基协同进化的统计推断方法最近在利用序列数据进行四级蛋白质结构预测方面取得了显著进展;然而,它们需要已知相互作用的同源蛋白质对的大型联合比对。生成此类比对本身就是一项复杂的计算任务;反过来,协同进化建模的应用仅限于没有旁系同源物的蛋白质,或相应编码基因在操纵子中共同定位的细菌系统。在这里,我们表明残基协同进化的直接耦合分析可以扩展到连接不同尺度,同时匹配相互作用的旁系同源物,识别蛋白质间残基-残基接触,并在多蛋白质系统中区分相互作用和非相互作用的家族。我们的结果将协同进化分析的潜在应用扩展到了远超出目前可处理情况的范围。