Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montréal, QC H3A 0G4, Canada.
Mila, Quebec AI Institute, 6666 St-Urbain Street #200, Montréal, QC H2S 3H1, Canada.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae405.
An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constraints in time and cost of the associated 'wet lab' experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method that incorporates orthology data using a new 'quintuplet' neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intraspecies and cross-species tasks using strict evaluation datasets. We show that INTREPPPID's orthologous locality loss increases performance because of the biological relevance of the orthologue data and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.
绝大多数蛋白质-蛋白质相互作用 (PPI) 的研究都是在少数几个选定的模式生物中进行的,这主要是由于相关“湿实验室”实验的时间和成本限制。基于计算的 PPI 推断方法是克服这些限制的理想工具,但它们在跨物种预测方面往往存在困难。我们提出了 INTREPPPID,这是一种使用新的“五联体”神经网络结合同源数据的方法,该网络由五个具有共享参数的并行编码器构建。INTREPPPID 同时包含 PPI 分类任务和同源局部性任务。后者学习具有小欧几里得距离的同源物的嵌入,并且所有其他蛋白质的嵌入之间具有大的距离。在使用严格评估数据集的情况下,INTREPPPID 在种内和跨物种任务上均优于所有其他经过测试的领先 PPI 推断方法。我们表明,INTREPPPID 的同源局部性损失提高了性能,这是由于同源数据的生物学相关性,而不是由于架构的其他一些虚假方面。最后,我们分别引入了 PPI.bio 和 PPI Origami,这是 INTREPPPID 的 Web 服务器接口和用于创建严格评估数据集的软件工具。这两个举措旨在使 PPI 推断工具的使用和开发更容易为社区所接受。