• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于节点度的边存在概率:基于网络预测的基线

The probability of edge existence due to node degree: a baseline for network-based predictions.

作者信息

Zietz Michael, Himmelstein Daniel S, Kloster Kyle, Williams Christopher, Nagle Michael W, Greene Casey S

机构信息

Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

出版信息

bioRxiv. 2023 Jan 6:2023.01.05.522939. doi: 10.1101/2023.01.05.522939.

DOI:10.1101/2023.01.05.522939
PMID:36711569
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9881952/
Abstract

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree's predictive performance diminishes when the networks used for training and testing-despite measuring the same biological relationships-were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

摘要

生物医学发现中的重要任务,如预测基因功能、基因与疾病的关联以及药物重新利用的机会,通常被构建为网络边预测问题。连接到一个节点的边的数量,即度,在实际生物医学网络中的不同节点之间可能有很大差异,并且度的分布在不同网络之间也有所不同。如果度对边预测有强烈影响,那么度分布的不平衡或偏差可能导致非特异性或误导性的预测。我们引入了一个网络排列框架来量化节点度对边预测的影响。我们的框架将性能分解为可归因于度的比例和网络的特定连接。我们发现,可归因于度以外因素的性能通常仅占整体性能的一小部分。当用于训练和测试的网络——尽管测量的是相同的生物学关系——是使用不同技术生成的,因此度分布有很大差异时,度的预测性能会下降。我们引入了排列衍生的边先验,即仅基于度存在边的概率。边先验对20个生物医学网络(16个二分图、3个无向图、1个有向图)显示出出色的区分能力和校准能力,曲线下面积(AUROC)经常超过0.85。试图预测生物网络中新的或缺失边的研究人员应使用边先验作为基线,以确定由于度而导致的非特异性性能部分。我们将我们的方法作为一个开源Python包发布(https://github.com/hetio/xswap/)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/3b396f13ce3f/nihpp-2023.01.05.522939v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/721e1c14f79b/nihpp-2023.01.05.522939v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/c17259f6e195/nihpp-2023.01.05.522939v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/aa03d6372094/nihpp-2023.01.05.522939v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/bd35e64fb487/nihpp-2023.01.05.522939v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/6488098f5459/nihpp-2023.01.05.522939v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/97cb80fefdcb/nihpp-2023.01.05.522939v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/23e307ed4058/nihpp-2023.01.05.522939v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/3b396f13ce3f/nihpp-2023.01.05.522939v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/721e1c14f79b/nihpp-2023.01.05.522939v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/c17259f6e195/nihpp-2023.01.05.522939v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/aa03d6372094/nihpp-2023.01.05.522939v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/bd35e64fb487/nihpp-2023.01.05.522939v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/6488098f5459/nihpp-2023.01.05.522939v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/97cb80fefdcb/nihpp-2023.01.05.522939v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/23e307ed4058/nihpp-2023.01.05.522939v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e6/9881952/3b396f13ce3f/nihpp-2023.01.05.522939v1-f0008.jpg

相似文献

1
The probability of edge existence due to node degree: a baseline for network-based predictions.基于节点度的边存在概率:基于网络预测的基线
bioRxiv. 2023 Jan 6:2023.01.05.522939. doi: 10.1101/2023.01.05.522939.
2
The probability of edge existence due to node degree: a baseline for network-based predictions.节点度导致边缘存在的概率:基于网络的预测的基线。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae001.
3
Edge removal balances preferential attachment and triad closing.边移除平衡了优先连接和三元闭合。
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Oct;88(4):042815. doi: 10.1103/PhysRevE.88.042815. Epub 2013 Oct 22.
4
NetTDP: permutation-based true discovery proportions for differential co-expression network analysis.NetTDP:基于置换的差异共表达网络分析的真实发现比例。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac417.
5
MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks.MVGCN:通过多视图图卷积网络进行数据集成以预测生物医学二分网络中的链接
Bioinformatics. 2022 Jan 3;38(2):426-434. doi: 10.1093/bioinformatics/btab651.
6
Statistical analysis of edges and bredges in configuration model networks.配置模型网络中边和桥的统计分析
Phys Rev E. 2020 Jul;102(1-1):012314. doi: 10.1103/PhysRevE.102.012314.
7
Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning.节点度感知边采样可减轻基于生物医学随机游走的图表示学习中虚高的分类性能。
Bioinform Adv. 2024 Mar 4;4(1):vbae036. doi: 10.1093/bioadv/vbae036. eCollection 2024.
8
Equal opportunity for low-degree network nodes: a PageRank-based method for protein target identification in metabolic graphs.低层次网络节点的平等机会:基于 PageRank 的代谢网络图中蛋白质靶标鉴定方法。
PLoS One. 2013;8(1):e54204. doi: 10.1371/journal.pone.0054204. Epub 2013 Jan 29.
9
A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks.一种基于图正则化广义矩阵分解模型的生物医学二部网络链路预测方法。
Bioinformatics. 2020 Jun 1;36(11):3474-3481. doi: 10.1093/bioinformatics/btaa157.
10
Pre-training graph neural networks for link prediction in biomedical networks.用于生物医学网络中链接预测的预训练图神经网络。
Bioinformatics. 2022 Apr 12;38(8):2254-2262. doi: 10.1093/bioinformatics/btac100.

本文引用的文献

1
Hetnet connectivity search provides rapid insights into how biomedical entities are related.异构网络连接搜索提供了对生物医学实体如何相关的快速洞察。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad047. Epub 2023 Jul 28.
2
Open collaborative writing with Manubot.使用 Manubot 进行开放式协作写作。
PLoS Comput Biol. 2019 Jun 24;15(6):e1007128. doi: 10.1371/journal.pcbi.1007128. eCollection 2019 Jun.
3
Tracking the popularity and outcomes of all bioRxiv preprints.追踪所有 bioRxiv 预印本的流行度和结果。
Elife. 2019 Apr 24;8:e45133. doi: 10.7554/eLife.45133.
4
Scale-free networks are rare.无标度网络很罕见。
Nat Commun. 2019 Mar 4;10(1):1017. doi: 10.1038/s41467-019-08746-5.
5
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
6
TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions.TRRUST v2:一个扩展的人类和小鼠转录调控相互作用的参考数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D380-D386. doi: 10.1093/nar/gkx1013.
7
Systematic integration of biomedical knowledge prioritizes drugs for repurposing.系统整合生物医学知识,优先考虑药物的再利用。
Elife. 2017 Sep 22;6:e26726. doi: 10.7554/eLife.26726.
8
Correcting for the study bias associated with protein-protein interaction measurements reveals differences between protein degree distributions from different cancer types.校正与蛋白质-蛋白质相互作用测量相关的研究偏差后,不同癌症类型的蛋白质度分布之间存在差异。
Front Genet. 2015 Aug 4;6:260. doi: 10.3389/fgene.2015.00260. eCollection 2015.
9
Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes.异构网络边缘预测:一种用于对疾病相关基因进行优先级排序的数据整合方法。
PLoS Comput Biol. 2015 Jul 9;11(7):e1004259. doi: 10.1371/journal.pcbi.1004259. eCollection 2015 Jul.
10
Addressing false discoveries in network inference.解决网络推理中的错误发现问题。
Bioinformatics. 2015 Sep 1;31(17):2836-43. doi: 10.1093/bioinformatics/btv215. Epub 2015 Apr 24.