• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统计归一化对网络传播评分的影响。

The effect of statistical normalization on network propagation scores.

机构信息

B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, 08028, Spain.

Esplugues de Llobregat, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Barcelona, 08950, Spain.

出版信息

Bioinformatics. 2021 May 5;37(6):845-852. doi: 10.1093/bioinformatics/btaa896.

DOI:10.1093/bioinformatics/btaa896
PMID:33070187
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8097756/
Abstract

MOTIVATION

Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.

RESULTS

Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities.

AVAILABILITY

The code is publicly available at https://github.com/b2slab/diffuBench and the data underlying this article are available at https://github.com/b2slab/retroData.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

网络扩散和标签传播是计算生物学中的基本工具,可应用于基因疾病关联、蛋白质功能预测和模块发现等领域。最近,由于担心网络拓扑结构可能会影响扩散分数的偏差,一些出版物在传播过程后引入了置换分析。这就提出了一个问题,即在其每个应用中,这种扩散过程的统计性质和存在偏差的情况如何。在这项工作中,我们对置换分析背后的一些常见的零模型和扩散分数的统计性质进行了特征描述。我们在三个案例研究中对七种扩散分数进行了基准测试:酵母互作网络上的合成信号、蛋白质互作网络上的模拟差异基因表达和另一个互作网络上的前瞻性基因集预测。为了清晰起见,所有数据集都基于二进制标签,但我们也为定量标签提供了理论结果。

结果

从二进制标签开始的扩散分数受到标签编码的影响,并且表现出依赖于问题的拓扑偏差,这种偏差可以通过统计归一化来消除。参数和非参数归一化通过独立于编码和均衡偏差来解决这两个问题。我们确定并量化了两种偏差来源——均值和方差——这在归一化分数时会产生性能差异。我们为两者提供了封闭公式,并展示了零协方差与图的谱性质之间的关系。尽管没有一种提出的分数系统地优于其他分数,但在寻求的正标签与偏差不一致时,应优先进行归一化。我们的结论是,关于偏差消除的决策应该是问题和数据驱动的,即基于对偏差及其与正实体关系的定量分析。

可用性

代码可在 https://github.com/b2slab/diffuBench 上公开获取,本文所使用的数据可在 https://github.com/b2slab/retroData 上获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
The effect of statistical normalization on network propagation scores.统计归一化对网络传播评分的影响。
Bioinformatics. 2021 May 5;37(6):845-852. doi: 10.1093/bioinformatics/btaa896.
2
diffuStats: an R package to compute diffusion-based scores on biological networks.diffuStats:一个用于计算生物网络基于扩散的分数的 R 包。
Bioinformatics. 2018 Feb 1;34(3):533-534. doi: 10.1093/bioinformatics/btx632.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Optimizing network propagation for multi-omics data integration.优化网络传播以进行多组学数据整合。
PLoS Comput Biol. 2021 Nov 11;17(11):e1009161. doi: 10.1371/journal.pcbi.1009161. eCollection 2021 Nov.
5
Prioritizing disease genes with an improved dual label propagation framework.利用改进的双重标签传播框架优先考虑疾病基因。
BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.
6
Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.无监督构建具有显式结构归纳偏差的基因表达数据的计算图。
Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.
7
MONACO: accurate biological network alignment through optimal neighborhood matching between focal nodes.摩纳哥:通过焦点节点之间的最优邻域匹配实现精确的生物网络比对。
Bioinformatics. 2021 Jun 16;37(10):1401-1410. doi: 10.1093/bioinformatics/btaa962.
8
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
9
GLIDER: function prediction from GLIDE-based neighborhoods.GLIDER:基于 GLIDE 近邻的功能预测。
Bioinformatics. 2022 Jun 27;38(13):3395-3406. doi: 10.1093/bioinformatics/btac322.
10
Prediction of cancer driver genes through network-based moment propagation of mutation scores.通过基于网络的突变分数矩传播预测癌症驱动基因。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i508-i515. doi: 10.1093/bioinformatics/btaa452.

引用本文的文献

1
Emerging role of a systems biology approach to elucidate factors of reduced penetrance: transcriptional changes in -linked dystonia as an example.系统生物学方法在阐明外显率降低因素中的新作用:以X连锁肌张力障碍的转录变化为例
Med Genet. 2022 Aug 12;34(2):131-141. doi: 10.1515/medgen-2022-2126. eCollection 2022 Jun.

本文引用的文献

1
Benchmarking network algorithms for contextualizing genes of interest.评估用于语境化目标基因的网络算法。
PLoS Comput Biol. 2019 Dec 20;15(12):e1007403. doi: 10.1371/journal.pcbi.1007403. eCollection 2019 Dec.
2
Benchmarking network propagation methods for disease gene identification.用于疾病基因识别的网络传播方法的基准测试。
PLoS Comput Biol. 2019 Sep 3;15(9):e1007276. doi: 10.1371/journal.pcbi.1007276. eCollection 2019 Sep.
3
Graph convolutional networks for computational drug development and discovery.图卷积网络在计算药物研发和发现中的应用。
Brief Bioinform. 2020 May 21;21(3):919-935. doi: 10.1093/bib/bbz042.
4
Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning.基于深度学习的序列结合预测中交叉验证策略的评估。
J Chem Inf Model. 2019 Apr 22;59(4):1645-1657. doi: 10.1021/acs.jcim.8b00663. Epub 2019 Feb 22.
5
Comparative Analysis of Normalization Methods for Network Propagation.网络传播归一化方法的比较分析
Front Genet. 2019 Jan 22;10:4. doi: 10.3389/fgene.2019.00004. eCollection 2019.
6
Null diffusion-based enrichment for metabolomics data.基于无扩散的代谢组学数据富集。
PLoS One. 2017 Dec 6;12(12):e0189012. doi: 10.1371/journal.pone.0189012. eCollection 2017.
7
diffuStats: an R package to compute diffusion-based scores on biological networks.diffuStats:一个用于计算生物网络基于扩散的分数的 R 包。
Bioinformatics. 2018 Feb 1;34(3):533-534. doi: 10.1093/bioinformatics/btx632.
8
Network propagation: a universal amplifier of genetic associations.网络传播:遗传关联的通用放大器。
Nat Rev Genet. 2017 Sep;18(9):551-562. doi: 10.1038/nrg.2017.38. Epub 2017 Jun 12.
9
A large-scale benchmark of gene prioritization methods.大规模基因优先级方法基准测试。
Sci Rep. 2017 Apr 21;7:46598. doi: 10.1038/srep46598.
10
AptRank: an adaptive PageRank model for protein function prediction on   bi-relational graphs.AptRank:一种用于生物关系图上蛋白质功能预测的自适应 PageRank 模型。
Bioinformatics. 2017 Jun 15;33(12):1829-1836. doi: 10.1093/bioinformatics/btx029.