• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基于序列的无偏蛋白质-蛋白质相互作用预测的深度学习模型,其准确率稳定在0.65。

Deep learning models for unbiased sequence-based PPI prediction plateau at an accuracy of 0.65.

作者信息

Reim Timo, Hartebrodt Anne, Blumenthal David B, Bernett Judith, List Markus

机构信息

Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Freising, 85354, Germany.

Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91052, Germany.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i590-i598. doi: 10.1093/bioinformatics/btaf192.

DOI:10.1093/bioinformatics/btaf192
PMID:40662806
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12261406/
Abstract

MOTIVATION

As most proteins interact with other proteins to perform their respective functions, methods to computationally predict these interactions have been developed. However, flawed evaluation schemes and data leakage in test sets have obscured the fact that sequence-based protein-protein interaction (PPI) prediction is still an open problem. Recently, methods achieving better-than-random performance on leakage-reduced PPI data have been proposed.

RESULTS

Here, we show that the use of ESM-2 protein embeddings explains this performance gain irrespective of model architecture. We compared the performance of models with varying complexity, per-protein, and per-token embeddings, as well as the influence of self- or cross-attention, where all models plateaued at an accuracy of 0.65. Moreover, we show that the tested sequence-based models cannot implicitly learn a contact map as an intermediate layer. These results imply that other input types, such as structure, might be necessary for producing reliable PPI predictions.

AVAILABILITY AND IMPLEMENTATION

All code for models and execution of the models is available at https://github.com/daisybio/PPI_prediction_study. Python version 3.8.18 and PyTorch version 2.1.1 were used for this study. The environment containing the versions of all other packages used can be found in the GitHub repository. The used data are available at https://doi.org/10.6084/m9.figshare.21591618.v3.

摘要

动机

由于大多数蛋白质与其他蛋白质相互作用以执行各自的功能,因此已经开发了用于计算预测这些相互作用的方法。然而,有缺陷的评估方案和测试集中的数据泄漏掩盖了基于序列的蛋白质-蛋白质相互作用(PPI)预测仍然是一个未解决问题的事实。最近,有人提出了在减少泄漏的PPI数据上实现优于随机性能的方法。

结果

在这里,我们表明,无论模型架构如何,使用ESM-2蛋白质嵌入都能解释这种性能提升。我们比较了具有不同复杂度、每个蛋白质和每个token嵌入的模型的性能,以及自注意力或交叉注意力的影响,所有模型的准确率都稳定在0.65。此外,我们表明,经过测试的基于序列的模型不能隐式地将接触图学习为中间层。这些结果意味着,可能需要其他输入类型,如结构,才能产生可靠的PPI预测。

可用性和实现

模型的所有代码和模型执行都可在https://github.com/daisybio/PPI_prediction_study获得。本研究使用了Python 3.8.18版本和PyTorch 2.1.1版本。包含所有其他使用包版本的环境可在GitHub存储库中找到。使用的数据可在https://doi.org/10.6084/m9.figshare.21591618.v3获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/6a4dddc5afd5/btaf192f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/424b6de7b801/btaf192f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/e6b6a6c3ef7f/btaf192f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/5842e01e39bd/btaf192f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/1a403b6ce9fe/btaf192f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/6a4dddc5afd5/btaf192f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/424b6de7b801/btaf192f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/e6b6a6c3ef7f/btaf192f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/5842e01e39bd/btaf192f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/1a403b6ce9fe/btaf192f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c82a/12261406/6a4dddc5afd5/btaf192f5.jpg

相似文献

1
Deep learning models for unbiased sequence-based PPI prediction plateau at an accuracy of 0.65.用于基于序列的无偏蛋白质-蛋白质相互作用预测的深度学习模型,其准确率稳定在0.65。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i590-i598. doi: 10.1093/bioinformatics/btaf192.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
5
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
6
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
7
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
8
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
9
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

本文引用的文献

1
Pool PaRTI: a PageRank-based pooling method for identifying critical residues and enhancing protein sequence representations.Pool PaRTI:一种基于PageRank的池化方法,用于识别关键残基并增强蛋白质序列表示。
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf330.
2
Aggregating residue-level protein language model embeddings with optimal transport.通过最优传输聚合残基水平的蛋白质语言模型嵌入
Bioinform Adv. 2025 Mar 20;5(1):vbaf060. doi: 10.1093/bioadv/vbaf060. eCollection 2025.
3
Simulating 500 million years of evolution with a language model.
用语言模型模拟5亿年的进化历程。
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
4
TUnA: an uncertainty-aware transformer model for sequence-based protein-protein interaction prediction.TUnA:一种基于序列的蛋白质-蛋白质相互作用预测的不确定性感知的 Transformer 模型。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae359.
5
Democratizing protein language models with parameter-efficient fine-tuning.参数高效微调:用民主化方法对蛋白质语言模型进行优化。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2405840121. doi: 10.1073/pnas.2405840121. Epub 2024 Jun 20.
6
Cracking the black box of deep sequence-based protein-protein interaction prediction.破解基于深度序列的蛋白质-蛋白质相互作用预测的黑箱。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae076.
7
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
8
D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.D-SCRIPT 通过基于序列、结构感知的基因组规模的蛋白质-蛋白质相互作用预测,将基因组转化为表型。
Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.
9
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
10
HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks.HIPPIE v2.0:增强蛋白质-蛋白质相互作用网络的意义和可靠性。
Nucleic Acids Res. 2017 Jan 4;45(D1):D408-D414. doi: 10.1093/nar/gkw985. Epub 2016 Oct 24.