评估因果图中的统计显著性。

Assessing statistical significance in causal graphs.

机构信息

Computational Sciences Center of Emphasis, Pfizer Worldwide Research & Development, Cambridge, MA, USA.

出版信息

BMC Bioinformatics. 2012 Feb 20;13:35. doi: 10.1186/1471-2105-13-35.

DOI:10.1186/1471-2105-13-35

PMID:22348444

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3307026/

Abstract

BACKGROUND

Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing.

RESULTS

First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds.Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs.

CONCLUSION

We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.

摘要

背景

因果图是分析生物数据集的一种越来越流行的工具。特别是，有符号因果图——其边还带有表示上调或下调的符号的有向图——可用于构建细胞内的调控网络模型。这些模型可以预测生物实体调控的下游效应；相反，它们还可以推断观察到的表达变化背后的因果因素。然而，由于其复杂的性质，有符号因果图模型在评估统计显著性方面提出了特殊的挑战。在本文中，我们提出并解决了在计算假设检验的适当零分布时出现的两个基本计算问题。

结果

首先，我们展示了如何计算观察到的和模型预测的基因转录本上调、下调或均无的分类之间的一致性的 p 值。具体来说，在观察到的分类随机化的零分布下，分类达到相同程度的可能性有多大？由于其数学形式，我们将这个问题称为“三元点积分布”，可以将其视为 Fisher 精确检验对三元变量的推广。我们提出了两种计算三元点积分布的计算效率算法，并从组合结构上对其进行了分析和数值研究，以建立计算复杂度的界限。其次，我们开发了一种有效执行因果图随机抽样的算法。这使得可以在不同的、同样重要的零分布下计算 p 值，该分布通过随机化图拓扑但固定其基本结构（连通性以及每个顶点的正负入度和出度）来获得。我们提供了一种从该分布中均匀随机抽样图的算法。我们还强调了有符号因果图所特有的理论挑战；之前关于图随机化的工作研究了无向图和有向但无符号图。

结论

我们提出了因果图方法应用所必需的两个统计显著性问题的算法解决方案，这是一种强大的生物网络分析工具。我们提出的算法既快速又正确。我们的工作在非生物学背景下也可能具有独立的意义，因为它推广了在其他领域中已经得到广泛研究的数学结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6867/3307026/6b46881f91d8/1471-2105-13-35-1.jpg

相似文献

Assessing statistical significance in causal graphs.评估因果图中的统计显著性。

BMC Bioinformatics. 2012 Feb 20;13:35. doi: 10.1186/1471-2105-13-35.

Computing paths and cycles in biological interaction graphs.计算生物相互作用图中的路径和循环。

BMC Bioinformatics. 2009 Jun 15;10:181. doi: 10.1186/1471-2105-10-181.

A spectral graph convolution for signed directed graphs via magnetic Laplacian.基于磁拉普拉斯的有向符号图的谱图卷积。

Neural Netw. 2023 Jul;164:562-574. doi: 10.1016/j.neunet.2023.05.009. Epub 2023 May 12.

Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning.利用元机器学习开发个性化生物医学因果图学习的新因果推理算法。

BMC Med Inform Decis Mak. 2024 May 27;24(1):137. doi: 10.1186/s12911-024-02510-6.

A linear programming approach for estimating the structure of a sparse linear genetic network from transcript profiling data.一种用于从转录谱数据估计稀疏线性遗传网络结构的线性规划方法。

Algorithms Mol Biol. 2009 Feb 24;4:5. doi: 10.1186/1748-7188-4-5.

Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks.使用因果图解释转录变化：新方法及其在公共网络上的实际应用

BMC Bioinformatics. 2016 Aug 24;17(1):318. doi: 10.1186/s12859-016-1181-8.

Fitting a geometric graph to a protein-protein interaction network.将几何图拟合到蛋白质-蛋白质相互作用网络。

Bioinformatics. 2008 Apr 15;24(8):1093-9. doi: 10.1093/bioinformatics/btn079. Epub 2008 Mar 14.

From graph topology to ODE models for gene regulatory networks.从图拓扑学到基因调控网络的 ODE 模型。

PLoS One. 2020 Jun 30;15(6):e0235070. doi: 10.1371/journal.pone.0235070. eCollection 2020.

Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。

BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.

An algorithm for score aggregation over causal biological networks based on random walk sampling.一种基于随机游走采样的因果生物网络得分聚合算法。

BMC Res Notes. 2014 Aug 11;7:516. doi: 10.1186/1756-0500-7-516.

引用本文的文献

A Bayesian noisy logic model for inference of transcription factor activity from single cell and bulk transcriptomic data.一种用于从单细胞和批量转录组数据推断转录因子活性的贝叶斯噪声逻辑模型。

NAR Genom Bioinform. 2023 Dec 13;5(4):lqad106. doi: 10.1093/nargab/lqad106. eCollection 2023 Dec.

Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks.整合知识和组学，通过信号网络的大规模模型来破译机制。

Mol Syst Biol. 2022 Jul;18(7):e11036. doi: 10.15252/msb.202211036.

CausalR: extracting mechanistic sense from genome scale data.因果关系推理：从基因组规模数据中提取机制意义。

Bioinformatics. 2017 Nov 15;33(22):3670-3672. doi: 10.1093/bioinformatics/btx425.

BMC Bioinformatics. 2016 Aug 24;17(1):318. doi: 10.1186/s12859-016-1181-8.

Genome-wide expression analysis suggests a crucial role of dysregulation of matrix metalloproteinases pathway in undifferentiated thyroid carcinoma.全基因组表达分析表明基质金属蛋白酶通路失调在未分化甲状腺癌中起关键作用。

BMC Genomics. 2015 Mar 18;16(1):207. doi: 10.1186/s12864-015-1372-0.

An algorithm for score aggregation over causal biological networks based on random walk sampling.一种基于随机游走采样的因果生物网络得分聚合算法。

BMC Res Notes. 2014 Aug 11;7:516. doi: 10.1186/1756-0500-7-516.

Integrative genomics with mediation analysis in a survival context.生存分析中整合基因组学与中介分析。

Comput Math Methods Med. 2013;2013:413783. doi: 10.1155/2013/413783. Epub 2013 Dec 18.

Causal analysis approaches in Ingenuity Pathway Analysis.Ingenuity 通路分析中的因果分析方法。

Bioinformatics. 2014 Feb 15;30(4):523-30. doi: 10.1093/bioinformatics/btt703. Epub 2013 Dec 13.

Genes contributing to pain sensitivity in the normal population: an exome sequencing study.正常人群疼痛敏感性相关基因：外显子组测序研究。

PLoS Genet. 2012;8(12):e1003095. doi: 10.1371/journal.pgen.1003095. Epub 2012 Dec 20.

本文引用的文献

Causal reasoning on biological networks: interpreting transcriptional changes.生物网络中的因果推理：解释转录变化。

Bioinformatics. 2012 Apr 15;28(8):1114-21. doi: 10.1093/bioinformatics/bts090. Epub 2012 Feb 21.

Computationally efficient measure of topological redundancy of biological and social networks.生物和社会网络拓扑冗余的计算高效度量。

Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 2):036117. doi: 10.1103/PhysRevE.84.036117. Epub 2011 Sep 29.

Hypoxia promotes proliferation and osteogenic differentiation potentials of human mesenchymal stem cells.缺氧促进人骨髓间充质干细胞的增殖和成骨分化潜能。

J Orthop Res. 2012 Feb;30(2):260-6. doi: 10.1002/jor.21517. Epub 2011 Aug 1.

The role of hypoxia in 2-butoxyethanol-induced hemangiosarcoma.缺氧在 2-丁氧基乙醇诱导的血管肉瘤中的作用。

Toxicol Sci. 2010 Jan;113(1):254-66. doi: 10.1093/toxsci/kfp213. Epub 2009 Oct 7.

RNAiCut: automated detection of significant genes from functional genomic screens.RNAiCut：从功能基因组筛选中自动检测重要基因。

Nat Methods. 2009 Jul;6(7):476-7. doi: 10.1038/nmeth0709-476.

Global comparative transcriptome analysis of cartilage formation in vivo.体内软骨形成的全球比较转录组分析。

BMC Dev Biol. 2009 Mar 10;9:20. doi: 10.1186/1471-213X-9-20.

A general modular framework for gene set enrichment analysis.一种用于基因集富集分析的通用模块化框架。

BMC Bioinformatics. 2009 Feb 3;10:47. doi: 10.1186/1471-2105-10-47.

Global alignment of multiple protein interaction networks with application to functional orthology detection.多个蛋白质相互作用网络的全局比对及其在功能直系同源检测中的应用。

Proc Natl Acad Sci U S A. 2008 Sep 2;105(35):12763-8. doi: 10.1073/pnas.0806627105. Epub 2008 Aug 25.

SIRT1 promotes differentiation of normal human keratinocytes.沉默调节蛋白1促进正常人角质形成细胞的分化。

J Invest Dermatol. 2009 Jan;129(1):41-9. doi: 10.1038/jid.2008.179. Epub 2008 Jun 19.

Hypoxia promotes the differentiated human articular chondrocyte phenotype through SOX9-dependent and -independent pathways.缺氧通过SOX9依赖和非依赖途径促进人关节软骨细胞分化表型。

J Biol Chem. 2008 Feb 22;283(8):4778-86. doi: 10.1074/jbc.M707729200. Epub 2007 Dec 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估因果图中的统计显著性。

Assessing statistical significance in causal graphs.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献