• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于简单序列的核函数不能预测蛋白质-蛋白质相互作用。

Simple sequence-based kernels do not predict protein-protein interactions.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.

出版信息

Bioinformatics. 2010 Oct 15;26(20):2610-4. doi: 10.1093/bioinformatics/btq483. Epub 2010 Aug 27.

DOI:10.1093/bioinformatics/btq483
PMID:20801913
Abstract

MOTIVATION

A number of methods have been reported that predict protein-protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?

RESULTS

We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value.

AVAILABILITY

Our method, named 'BRS-nonint', is available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/. All the datasets used in this study are derived from publicly available data, and are available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html

CONTACT

maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk.

摘要

动机

已经有许多方法被报道,可以仅使用基于简单序列的特征(如氨基酸 3 -mer 含量),以高精度预测蛋白质-蛋白质相互作用(PPIs)。这令人惊讶,因为许多蛋白质相互作用具有高度特异性,这取决于物理化学互补表面之间的详细原子识别。报道的高精度是否现实?

结果

我们发现,预测的报告精度被严重高估,并且强烈依赖于所使用的训练和测试数据集的结构。在训练数据中,哪些蛋白质对被认为是非相互作用的选择对精度估计有可变的影响,并且通过正数据中优势样本的偏差,即蛋白质相互作用网络中存在中心蛋白质,精度可以人为地膨胀。为了解决这个偏差,我们提出了一种针对正集的方法来创建一个“平衡”的负集,同时保持每个蛋白质的度分布,得出的结论是,基于简单序列的特征包含的信息不足以用于预测 PPIs,但基于蛋白质结构域的特征具有一定的预测价值。

可用性

我们的方法名为“BRS-nonint”,可在 http://www.bioinformatics.leeds.ac.uk/BRS-nonint/ 上获得。本研究中使用的所有数据集均源自公开可用的数据,并可在 http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html 上获得。

联系信息

maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk。

相似文献

1
Simple sequence-based kernels do not predict protein-protein interactions.基于简单序列的核函数不能预测蛋白质-蛋白质相互作用。
Bioinformatics. 2010 Oct 15;26(20):2610-4. doi: 10.1093/bioinformatics/btq483. Epub 2010 Aug 27.
2
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.基于伪氨基酸组成和混合特征选择的蛋白质-蛋白质相互作用预测
Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22. doi: 10.1016/j.bbrc.2009.01.077. Epub 2009 Jan 24.
3
Computational prediction of protein-protein interactions.蛋白质-蛋白质相互作用的计算预测
Methods Mol Biol. 2004;261:445-68. doi: 10.1385/1-59259-762-9:445.
4
Computational design, construction, and characterization of a set of specificity determining residues in protein-protein interactions.计算设计、构建和鉴定蛋白质-蛋白质相互作用中一组特异性决定残基。
Proteins. 2012 Oct;80(10):2426-36. doi: 10.1002/prot.24127. Epub 2012 Jul 10.
5
A discriminative approach for identifying domain-domain interactions from protein-protein interactions.一种从蛋白质相互作用中识别结构域-结构域相互作用的判别方法。
Proteins. 2010 Apr;78(5):1243-53. doi: 10.1002/prot.22643.
6
Preferential use of protein domain pairs as interaction mediators: order and transitivity.优先使用蛋白质结构域对作为相互作用介体:顺序和传递性。
Bioinformatics. 2010 Oct 15;26(20):2564-70. doi: 10.1093/bioinformatics/btq495. Epub 2010 Aug 27.
7
A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction.十年蛋白质结构预测关键评估(CASP):蛋白质结构预测的进展、瓶颈与前景
Curr Opin Struct Biol. 2005 Jun;15(3):285-9. doi: 10.1016/j.sbi.2005.05.011.
8
Evaluation of different domain-based methods in protein interaction prediction.蛋白质相互作用预测中不同基于结构域方法的评估。
Biochem Biophys Res Commun. 2009 Dec 18;390(3):357-62. doi: 10.1016/j.bbrc.2009.09.130. Epub 2009 Oct 2.
9
Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features.基于潜在主题特征的从氨基酸序列大规模预测人类蛋白质-蛋白质相互作用。
J Proteome Res. 2010 Oct 1;9(10):4992-5001. doi: 10.1021/pr100618t.
10
Kernel methods for predicting protein-protein interactions.用于预测蛋白质-蛋白质相互作用的核方法。
Bioinformatics. 2005 Jun;21 Suppl 1:i38-46. doi: 10.1093/bioinformatics/bti1016.

引用本文的文献

1
Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning.负采样策略会影响利用机器学习对无标度生物分子网络相互作用的预测。
BMC Biol. 2025 May 9;23(1):123. doi: 10.1186/s12915-025-02231-w.
2
Prediction of drug target interaction based on under sampling strategy and random forest algorithm.基于欠采样策略和随机森林算法的药物靶点相互作用预测
PLoS One. 2025 Mar 6;20(3):e0318420. doi: 10.1371/journal.pone.0318420. eCollection 2025.
3
A protein sequence-based deep transfer learning framework for identifying human proteome-wide deubiquitinase-substrate interactions.
基于蛋白质序列的深度迁移学习框架,用于鉴定人类蛋白质组范围内去泛素化酶-底物相互作用。
Nat Commun. 2024 May 28;15(1):4519. doi: 10.1038/s41467-024-48446-3.
4
Pitfalls of machine learning models for protein-protein interaction networks.机器学习模型在蛋白质-蛋白质相互作用网络中的陷阱。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae012.
5
A robust protein language model for SARS-CoV-2 protein-protein interaction network prediction.用于 SARS-CoV-2 蛋白质相互作用网络预测的强健蛋白质语言模型。
Artif Intell Med. 2023 Aug;142:102574. doi: 10.1016/j.artmed.2023.102574. Epub 2023 May 6.
6
Assessment of community efforts to advance network-based prediction of protein-protein interactions.评估社区在推进基于网络的蛋白质-蛋白质相互作用预测方面的努力。
Nat Commun. 2023 Mar 22;14(1):1582. doi: 10.1038/s41467-023-37079-7.
7
Computational Methods and Deep Learning for Elucidating Protein Interaction Networks.计算方法与深度学习在阐明蛋白质相互作用网络中的应用。
Methods Mol Biol. 2023;2553:285-323. doi: 10.1007/978-1-0716-2617-7_15.
8
UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning.无偏 DTI:通过使用深度集成平衡学习来减轻药物-靶标相互作用预测的实际偏差。
Molecules. 2022 May 6;27(9):2980. doi: 10.3390/molecules27092980.
9
Benchmark Evaluation of Protein-Protein Interaction Prediction Algorithms.蛋白质-蛋白质相互作用预测算法的基准评估。
Molecules. 2021 Dec 22;27(1):41. doi: 10.3390/molecules27010041.
10
Computationally Reconstructed Interactome of USDA110 Reveals Novel Functional Modules and Protein Hubs for Symbiotic Nitrogen Fixation. USDA110 相互作用组的计算重建揭示了共生固氮的新功能模块和蛋白质枢纽。
Int J Mol Sci. 2021 Nov 2;22(21):11907. doi: 10.3390/ijms222111907.