Mukherjee Sach, Pelech Steven, Neve Richard M, Kuo Wen-Lin, Ziyad Safiyyah, Spellman Paul T, Gray Joe W, Speed Terence P
Department of Statistics, University of Warwick, Coventry, UK.
Bioinformatics. 2009 Jan 15;25(2):265-71. doi: 10.1093/bioinformatics/btn611. Epub 2008 Nov 27.
Combinatorial effects, in which several variables jointly influence an output or response, play an important role in biological systems. In many settings, Boolean functions provide a natural way to describe such influences. However, biochemical data using which we may wish to characterize such influences are usually subject to much variability. Furthermore, in high-throughput biological settings Boolean relationships of interest are very often sparse, in the sense of being embedded in an overall dataset of higher dimensionality. This motivates a need for statistical methods capable of making inferences regarding Boolean functions under conditions of noise and sparsity.
We put forward a statistical model for sparse, noisy Boolean functions and methods for inference under the model. We focus on the case in which the form of the underlying Boolean function, as well as the number and identity of its inputs are all unknown. We present results on synthetic data and on a study of signalling proteins in cancer biology.
组合效应(即几个变量共同影响一个输出或反应)在生物系统中起着重要作用。在许多情况下,布尔函数提供了一种描述此类影响的自然方式。然而,我们可能希望用其来表征此类影响的生化数据通常具有很大的变异性。此外,在高通量生物学环境中,从嵌入更高维度的总体数据集中来看,感兴趣的布尔关系往往很稀疏。这就促使需要能够在噪声和稀疏条件下对布尔函数进行推断的统计方法。
我们提出了一种用于稀疏、有噪声布尔函数的统计模型以及在该模型下进行推断的方法。我们专注于基础布尔函数的形式及其输入的数量和标识均未知的情况。我们展示了在合成数据以及癌症生物学中信号蛋白研究方面的结果。