机器学习模型在蛋白质-蛋白质相互作用网络中的陷阱。

Pitfalls of machine learning models for protein-protein interaction networks.

机构信息

Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom.

British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae012.

DOI:10.1093/bioinformatics/btae012

PMID:38200587

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10868344/

Abstract

MOTIVATION

Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained.

RESULTS

To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks.

AVAILABILITY AND IMPLEMENTATION

The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI.

摘要

动机

蛋白质-蛋白质相互作用 (PPIs) 对于理解生物途径及其在发育和疾病中的作用至关重要。基于经典机器学习的计算工具在预测蛋白质相互作用方面取得了成功，但由于缺乏一致和可靠的框架，导致网络模型难以比较，算法之间的差异也无法解释。

结果

为了更好地理解这些模型所依据的基本推理机制，我们设计了一个开源框架进行基准测试，该框架考虑了一系列生物学和统计学陷阱，同时促进了可重复性。我们使用它来阐明网络拓扑结构的影响以及不同算法如何处理高度连接的蛋白质。通过研究基于功能基因组学和基于序列的人类蛋白质相互作用模型，我们展示了它们的互补性，因为前者在孤立蛋白质上表现最佳，而后者则专门处理涉及枢纽的相互作用。我们还表明，算法设计对功能基因组数据的性能影响很小。我们在人类和 S. cerevisiae 数据之间复制了我们的结果，并表明使用功能基因组学的模型更适合跨物种的蛋白质相互作用预测。随着越来越多的序列和功能基因组数据的出现，我们的研究为未来构建、比较和应用蛋白质相互作用网络提供了一个有原则的基础。

可用性和实现

代码和数据可在 GitHub 上获得：https://github.com/Llannelongue/B4PPI。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11b9/10868344/f5c995af1a03/btae012f1.jpg

相似文献

Pitfalls of machine learning models for protein-protein interaction networks.机器学习模型在蛋白质-蛋白质相互作用网络中的陷阱。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae012.

Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation.基于进化矩阵表示的随机蕨类预测蛋白质-蛋白质相互作用。

Comput Math Methods Med. 2022 Feb 22;2022:7191684. doi: 10.1155/2022/7191684. eCollection 2022.

Completing sparse and disconnected protein-protein network by deep learning.通过深度学习填补稀疏且不连续的蛋白质-蛋白质网络。

BMC Bioinformatics. 2018 Mar 22;19(1):103. doi: 10.1186/s12859-018-2112-7.

A new two-stage method for revealing missing parts of edges in protein-protein interaction networks.一种揭示蛋白质-蛋白质相互作用网络中边缘缺失部分的新型两阶段方法。

PLoS One. 2017 May 11;12(5):e0177029. doi: 10.1371/journal.pone.0177029. eCollection 2017.

Machine-learning techniques for the prediction of protein-protein interactions.基于机器学习的蛋白质-蛋白质相互作用预测技术。

J Biosci. 2019 Sep;44(4).

Hierarchical graph learning for protein-protein interaction.层次图学习在蛋白质-蛋白质相互作用中的应用。

Nat Commun. 2023 Feb 25;14(1):1093. doi: 10.1038/s41467-023-36736-1.

t-LSE: a novel robust geometric approach for modeling protein-protein interaction networks.t-LSE：一种用于建模蛋白质相互作用网络的新颖稳健的几何方法。

PLoS One. 2013;8(4):e58368. doi: 10.1371/journal.pone.0058368. Epub 2013 Apr 1.

SiPAN: simultaneous prediction and alignment of protein-protein interaction networks.SiPAN：蛋白质-蛋白质相互作用网络的同步预测与比对

Bioinformatics. 2015 Jul 15;31(14):2356-63. doi: 10.1093/bioinformatics/btv160. Epub 2015 Mar 18.

An iteration method for identifying yeast essential proteins from heterogeneous network.从异质网络中鉴定酵母必需蛋白的迭代方法。

BMC Bioinformatics. 2019 Jun 24;20(1):355. doi: 10.1186/s12859-019-2930-2.

A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks.一种用于从多个异构网络中检测蛋白质复合物的多网络聚类方法。

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):463. doi: 10.1186/s12859-017-1877-4.

引用本文的文献

Ten recommendations for reducing the carbon footprint of research computing in human neuroimaging.减少人类神经影像研究计算碳足迹的十条建议。

Imaging Neurosci (Camb). 2024 Jan 29;1. doi: 10.1162/imag_a_00043. eCollection 2023.

Evaluation of Physics-Based Protein Design Methods for Predicting Single Residue Effects on Peptide Binding Specificities.基于物理学的蛋白质设计方法对预测单个残基对肽结合特异性影响的评估。

J Comput Chem. 2025 Jun 30;46(17):e70160. doi: 10.1002/jcc.70160.

Sensitivity analysis on protein-protein interaction networks through deep graph networks.通过深度图网络对蛋白质-蛋白质相互作用网络进行敏感性分析。

BMC Bioinformatics. 2025 May 8;26(1):124. doi: 10.1186/s12859-025-06140-1.

The role of alpha-synuclein in synucleinopathy: Impact on lipid regulation at mitochondria-ER membranes.α-突触核蛋白在突触核蛋白病中的作用：对线粒体-内质网膜脂质调节的影响。

NPJ Parkinsons Dis. 2025 Apr 30;11(1):103. doi: 10.1038/s41531-025-00960-x.

PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology.PIPENN-EMB集成网络和蛋白质嵌入技术将蛋白质界面预测推广到同源性之外。

Sci Rep. 2025 Feb 5;15(1):4391. doi: 10.1038/s41598-025-88445-y.

本文引用的文献

Cracking the black box of deep sequence-based protein-protein interaction prediction.破解基于深度序列的蛋白质-蛋白质相互作用预测的黑箱。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae076.

Assessment of community efforts to advance network-based prediction of protein-protein interactions.评估社区在推进基于网络的蛋白质-蛋白质相互作用预测方面的努力。

Nat Commun. 2023 Mar 22;14(1):1582. doi: 10.1038/s41467-023-37079-7.

Ten quick tips for sequence-based prediction of protein properties using machine learning.使用机器学习进行基于序列的蛋白质性质预测的十个快速技巧。

PLoS Comput Biol. 2022 Dec 1;18(12):e1010669. doi: 10.1371/journal.pcbi.1010669. eCollection 2022 Dec.

RAPPPID: towards generalizable protein interaction prediction with AWD-LSTM twin networks.RAPPPID：利用 AWDLSTM 孪生网络进行可泛化的蛋白质交互预测。

Bioinformatics. 2022 Aug 10;38(16):3958-3967. doi: 10.1093/bioinformatics/btac429.

Ten quick tips for deep learning in biology.生物学深度学习的十条快速提示。

PLoS Comput Biol. 2022 Mar 24;18(3):e1009803. doi: 10.1371/journal.pcbi.1009803. eCollection 2022 Mar.

The Carbon Footprint of Bioinformatics.生物信息学的碳足迹。

Mol Biol Evol. 2022 Mar 2;39(3). doi: 10.1093/molbev/msac034.

Benchmark Evaluation of Protein-Protein Interaction Prediction Algorithms.蛋白质-蛋白质相互作用预测算法的基准评估。

Molecules. 2021 Dec 22;27(1):41. doi: 10.3390/molecules27010041.

D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.D-SCRIPT 通过基于序列、结构感知的基因组规模的蛋白质-蛋白质相互作用预测，将基因组转化为表型。

Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.

A guide to machine learning for biologists.生物学机器学习指南。

Nat Rev Mol Cell Biol. 2022 Jan;23(1):40-55. doi: 10.1038/s41580-021-00407-0. Epub 2021 Sep 13.

Improved prediction of protein-protein interaction using a hybrid of functional-link Siamese neural network and gradient boosting machines.利用功能链接暹罗神经网络和梯度提升机的混合体改进蛋白质-蛋白质相互作用的预测。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab255.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习模型在蛋白质-蛋白质相互作用网络中的陷阱。

Pitfalls of machine learning models for protein-protein interaction networks.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献