• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

负采样策略会影响利用机器学习对无标度生物分子网络相互作用的预测。

Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning.

作者信息

Li Pengpai, Shao Bowen, Zhao Guoqing, Liu Zhi-Ping

机构信息

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China.

National Center for Applied Mathematics, Shandong University, Jinan, 250100, Shandong, China.

出版信息

BMC Biol. 2025 May 9;23(1):123. doi: 10.1186/s12915-025-02231-w.

DOI:10.1186/s12915-025-02231-w
PMID:40346567
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12065207/
Abstract

BACKGROUND

Understanding protein-molecular interaction is crucial for unraveling the mechanisms underlying diverse biological processes. Machine learning (ML) techniques have been extensively employed in predicting these interactions and have garnered substantial research focus. Previous studies have predominantly centered on improving model performance through novel and efficient ML approaches, often resulting in overoptimistic predictive estimates. However, these advancements frequently neglect the inherent biases stemming from network properties, particularly in biological contexts.

RESULTS

In this study, we examined the biases inherent in ML models during the learning and prediction of protein-molecular interactions, particularly those arising from the scale-free property of biological networks-a characteristic where in a few nodes have many connections while most have very few. Our comprehensive analysis across diverse tasks, datasets, and ML methods provides compelling evidence of these biases. We discovered that the training and evaluation of ML models are profoundly influenced by network topology, potentially distorting model performance assessments. To mitigate this issue, we propose the degree distribution balanced (DDB) sampling strategy, a straightforward yet potent approach that alleviates biases stemming from network properties. This method further underscores the limitations of certain ML models in learning protein-molecular interactions solely from intrinsic molecular features.

CONCLUSIONS

Our findings present a novel perspective for assessing the performance of ML models in inferring protein-molecular interactions with greater fairness. By addressing biases introduced by network properties, the DDB sampling approach provides a more balanced and precise assessment of model capabilities. These insights hold the potential to bolster the reliability of ML models in bioinformatics, fostering a more stringent evaluation framework for predicting protein-molecular interactions.

摘要

背景

理解蛋白质-分子相互作用对于揭示各种生物过程背后的机制至关重要。机器学习(ML)技术已被广泛用于预测这些相互作用,并获得了大量的研究关注。先前的研究主要集中在通过新颖且高效的ML方法来提高模型性能,这往往导致预测估计过于乐观。然而,这些进展常常忽略了源于网络属性的内在偏差,特别是在生物学背景下。

结果

在本研究中,我们研究了ML模型在学习和预测蛋白质-分子相互作用过程中固有的偏差,特别是那些源于生物网络无标度特性的偏差——即少数节点有许多连接而大多数节点连接很少的特征。我们对各种任务、数据集和ML方法进行的全面分析为这些偏差提供了有力证据。我们发现ML模型的训练和评估受到网络拓扑的深刻影响,这可能会扭曲模型性能评估。为了缓解这个问题,我们提出了度分布平衡(DDB)采样策略,这是一种简单而有效的方法,可以减轻源于网络属性的偏差。该方法进一步强调了某些ML模型仅从内在分子特征学习蛋白质-分子相互作用的局限性。

结论

我们的研究结果为更公平地评估ML模型在推断蛋白质-分子相互作用方面的性能提供了一个新的视角。通过解决网络属性引入的偏差,DDB采样方法对模型能力提供了更平衡和精确的评估。这些见解有可能提高ML模型在生物信息学中的可靠性,促进一个更严格的预测蛋白质-分子相互作用的评估框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/da569caa7e21/12915_2025_2231_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/0ece91ec435b/12915_2025_2231_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/afa79050a09a/12915_2025_2231_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/8c4a9923b621/12915_2025_2231_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/03ca13f5d0aa/12915_2025_2231_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/da569caa7e21/12915_2025_2231_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/0ece91ec435b/12915_2025_2231_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/afa79050a09a/12915_2025_2231_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/8c4a9923b621/12915_2025_2231_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/03ca13f5d0aa/12915_2025_2231_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fe/12065207/da569caa7e21/12915_2025_2231_Fig5_HTML.jpg

相似文献

1
Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning.负采样策略会影响利用机器学习对无标度生物分子网络相互作用的预测。
BMC Biol. 2025 May 9;23(1):123. doi: 10.1186/s12915-025-02231-w.
2
Structure-Based Approaches for Protein-Protein Interaction Prediction Using Machine Learning and Deep Learning.基于结构的机器学习和深度学习蛋白质-蛋白质相互作用预测方法
Biomolecules. 2025 Jan 17;15(1):141. doi: 10.3390/biom15010141.
3
Graph-based machine learning model for weight prediction in protein-protein networks.基于图的机器学习模型在蛋白质-蛋白质网络中的体重预测。
BMC Bioinformatics. 2024 Nov 8;25(1):349. doi: 10.1186/s12859-024-05973-6.
4
Integration of molecular coarse-grained model into geometric representation learning framework for protein-protein complex property prediction.将分子粗粒度模型集成到几何表示学习框架中,用于预测蛋白质-蛋白质复合物性质。
Nat Commun. 2024 Nov 7;15(1):9629. doi: 10.1038/s41467-024-53583-w.
5
Network topology measures for identifying disease-gene association in breast cancer.用于识别乳腺癌中疾病-基因关联的网络拓扑测量方法。
BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):274. doi: 10.1186/s12859-016-1095-5.
6
PathNetDRP: a novel biomarker discovery framework using pathway and protein-protein interaction networks for immune checkpoint inhibitor response prediction.PathNetDRP:一种使用通路和蛋白质-蛋白质相互作用网络进行免疫检查点抑制剂反应预测的新型生物标志物发现框架。
BMC Bioinformatics. 2025 May 5;26(1):119. doi: 10.1186/s12859-025-06125-0.
7
Prediction of Protein-Protein Interactions.蛋白质-蛋白质相互作用的预测
Curr Protoc Bioinformatics. 2017 Dec 8;60:8.2.1-8.2.14. doi: 10.1002/cpbi.38.
8
Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.基于机器学习的数据清理和后过滤程序预测蛋白质-蛋白质相互作用位点
J Membr Biol. 2016 Apr;249(1-2):141-53. doi: 10.1007/s00232-015-9856-z. Epub 2015 Nov 12.
9
PPILS: Protein-protein interaction prediction with language of biological coding.PPILS:利用生物编码语言进行蛋白质-蛋白质相互作用预测
Comput Biol Med. 2025 Mar;186:109678. doi: 10.1016/j.compbiomed.2025.109678. Epub 2025 Jan 19.
10
Predicting host-pathogen interactions with machine learning algorithms: A scoping review.使用机器学习算法预测宿主与病原体的相互作用:一项综述。
Infect Genet Evol. 2025 Jun;130:105751. doi: 10.1016/j.meegid.2025.105751. Epub 2025 Apr 10.

本文引用的文献

1
DeepMPF: deep learning framework for predicting drug-target interactions based on multi-modal representation with meta-path semantic analysis.DeepMPF:基于多模态表示和元路径语义分析的深度学习框架,用于预测药物-靶标相互作用。
J Transl Med. 2023 Jan 25;21(1):48. doi: 10.1186/s12967-023-03876-3.
2
GIFDTI: Prediction of Drug-Target Interactions Based on Global Molecular and Intermolecular Interaction Representation Learning.基于全局分子和分子间相互作用表示学习的药物-靶标相互作用预测(GIFDTI)。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):1943-1952. doi: 10.1109/TCBB.2022.3225423. Epub 2023 Jun 5.
3
LION: an integrated R package for effective prediction of ncRNA-protein interaction.
LION:一个用于有效预测 ncRNA-蛋白质相互作用的集成 R 包。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac420.
4
Drug-Target Interaction Prediction via Graph Auto-Encoder and Multi-Subspace Deep Neural Networks.基于图自动编码器和多子空间深度神经网络的药物-靶点相互作用预测
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2647-2658. doi: 10.1109/TCBB.2022.3206907. Epub 2023 Oct 9.
5
SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction.SDNN-PPI:基于深度神经网络的自注意力在蛋白质-蛋白质相互作用预测中的应用。
BMC Genomics. 2022 Jun 27;23(1):474. doi: 10.1186/s12864-022-08687-2.
6
An inductive graph neural network model for compound-protein interaction prediction based on a homogeneous graph.基于同质图的化合物-蛋白质相互作用预测的递推图神经网络模型。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac073.
7
BridgeDPI: a novel Graph Neural Network for predicting drug-protein interactions.BridgeDPI:一种用于预测药物-蛋白质相互作用的新型图神经网络。
Bioinformatics. 2022 Apr 28;38(9):2571-2578. doi: 10.1093/bioinformatics/btac155.
8
Predicting lncRNA-Protein Interactions by Heterogenous Network Embedding.基于异质网络嵌入预测长链非编码RNA-蛋白质相互作用
Front Genet. 2022 Feb 4;12:814073. doi: 10.3389/fgene.2021.814073. eCollection 2021.
9
DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug-Target interaction prediction.深度图Transformer 网络融合多层图信息的药物-靶标相互作用预测。
Comput Biol Med. 2022 Mar;142:105214. doi: 10.1016/j.compbiomed.2022.105214. Epub 2022 Jan 5.
10
EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models.EnANNDeep:基于集成学习的 lncRNA-蛋白质相互作用预测框架,采用自适应 k-最近邻分类器和深度模型。
Interdiscip Sci. 2022 Mar;14(1):209-232. doi: 10.1007/s12539-021-00483-y. Epub 2022 Jan 10.