Laboratoire de Biometrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unite Mixte de Recherche 5558, Université Lyon 1, Villeurbanne, France.
Proc Natl Acad Sci U S A. 2011 Jan 11;108(2):882-7. doi: 10.1073/pnas.1004751108. Epub 2010 Dec 27.
External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.
外部信息主要通过信号级联和转录激活在细胞中传播,使细胞能够对广泛的环境变化做出反应。高通量实验鉴定出了许多这样的级联的分子组成部分,但它们可能通过未知的伙伴相互作用。其中一些可以通过整合蛋白质-蛋白质相互作用网络和 mRNA 表达谱的数据来检测。这个推理问题可以映射到通过这些数据集定义的网络中找到合适的最优连接子图的问题上。一般来说,这个优化过程在计算上是难以处理的。在这里,我们从统计物理中得到启发,提出了一种新的分布式算法,并将该方案应用于酵母中的α因子和药物扰动数据。我们确定了 COS8 蛋白的作用,该蛋白是一个先前未知功能的基因家族的成员,并通过遗传实验验证了结果。我们提出的算法特别适用于非常大的数据集,可以并行运行,并可以应用于系统生物学中的其他问题。在著名的基准测试中,它优于该领域的其他算法。