通过分支对齐整合增强蛋白质-蛋白质相互作用预测中的协同进化信号

Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration.

作者信息

Fang Tao, Szklarczyk Damian, Hachilif Radja, von Mering Christian

机构信息

Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.

SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.

出版信息

Sci Rep. 2024 Mar 12;14(1):6009. doi: 10.1038/s41598-024-55655-9.

DOI:10.1038/s41598-024-55655-9

PMID:38472223

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10933411/

Abstract

Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.

摘要

蛋白质-蛋白质相互作用（PPIs）在大多数生物过程中发挥着至关重要的作用。相互作用蛋白质之间的结合界面施加了进化限制，这些限制已成功用于从多序列比对（MSA）中预测PPIs。为了构建MSA，必须做出关键选择：如何确保直系同源物的可靠识别，以及如何在需要大比对与足够比对质量之间进行最佳平衡。在这里，我们提出了一种用于生成MSA的分而治之策略：不是为每个蛋白质构建单个大比对，而是在生命树的不同进化枝下构建多个不同的比对。共进化信号在这些进化枝内分别搜索，随后仅使用机器学习技术进行整合。我们发现这种策略显著提高了整体预测性能，同时具有更好的比对质量。使用流行的DCA算法系统地搜索此类比对的对，展示了在细菌基因组中进行全基因组的全对全相互作用扫描。鉴于AlphaFold最近在预测原子细节的直接PPIs方面取得的成功，提出了一种发现并优化的方法：我们的方法可以提供一种快速准确的策略，用于对整个基因组进行预筛选，仅将有希望的相互作用候选物提交给AlphaFold，从而减少假阳性以及计算时间。