Suppr超能文献

评估因果图中的统计显著性。

Assessing statistical significance in causal graphs.

机构信息

Computational Sciences Center of Emphasis, Pfizer Worldwide Research & Development, Cambridge, MA, USA.

出版信息

BMC Bioinformatics. 2012 Feb 20;13:35. doi: 10.1186/1471-2105-13-35.

Abstract

BACKGROUND

Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing.

RESULTS

First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds.Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs.

CONCLUSION

We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.

摘要

背景

因果图是分析生物数据集的一种越来越流行的工具。特别是,有符号因果图——其边还带有表示上调或下调的符号的有向图——可用于构建细胞内的调控网络模型。这些模型可以预测生物实体调控的下游效应;相反,它们还可以推断观察到的表达变化背后的因果因素。然而,由于其复杂的性质,有符号因果图模型在评估统计显著性方面提出了特殊的挑战。在本文中,我们提出并解决了在计算假设检验的适当零分布时出现的两个基本计算问题。

结果

首先,我们展示了如何计算观察到的和模型预测的基因转录本上调、下调或均无的分类之间的一致性的 p 值。具体来说,在观察到的分类随机化的零分布下,分类达到相同程度的可能性有多大?由于其数学形式,我们将这个问题称为“三元点积分布”,可以将其视为 Fisher 精确检验对三元变量的推广。我们提出了两种计算三元点积分布的计算效率算法,并从组合结构上对其进行了分析和数值研究,以建立计算复杂度的界限。其次,我们开发了一种有效执行因果图随机抽样的算法。这使得可以在不同的、同样重要的零分布下计算 p 值,该分布通过随机化图拓扑但固定其基本结构(连通性以及每个顶点的正负入度和出度)来获得。我们提供了一种从该分布中均匀随机抽样图的算法。我们还强调了有符号因果图所特有的理论挑战;之前关于图随机化的工作研究了无向图和有向但无符号图。

结论

我们提出了因果图方法应用所必需的两个统计显著性问题的算法解决方案,这是一种强大的生物网络分析工具。我们提出的算法既快速又正确。我们的工作在非生物学背景下也可能具有独立的意义,因为它推广了在其他领域中已经得到广泛研究的数学结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6867/3307026/6b46881f91d8/1471-2105-13-35-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验