The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel, Department of Statistics and OR, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel, Functional Brain Center, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv 64239, Israel and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel, Department of Statistics and OR, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel, Functional Brain Center, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv 64239, Israel and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel, Department of Statistics and OR, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel, Functional Brain Center, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv 64239, Israel and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel, Department of Statistics and OR, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel, Functional Brain Center, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center, Tel Aviv 64239, Israel and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
Bioinformatics. 2015 Jun 15;31(12):i17-26. doi: 10.1093/bioinformatics/btv228.
Detecting modules of co-ordinated activity is fundamental in the analysis of large biological studies. For two-dimensional data (e.g. genes × patients), this is often done via clustering or biclustering. More recently, studies monitoring patients over time have added another dimension. Analysis is much more challenging in this case, especially when time measurements are not synchronized. New methods that can analyze three-way data are thus needed.
We present a new algorithm for finding coherent and flexible modules in three-way data. Our method can identify both core modules that appear in multiple patients and patient-specific augmentations of these core modules that contain additional genes. Our algorithm is based on a hierarchical Bayesian data model and Gibbs sampling. The algorithm outperforms extant methods on simulated and on real data. The method successfully dissected key components of septic shock response from time series measurements of gene expression. Detected patient-specific module augmentations were informative for disease outcome. In analyzing brain functional magnetic resonance imaging time series of subjects at rest, it detected the pertinent brain regions involved.
R code and data are available at http://acgt.cs.tau.ac.il/twigs/.
在分析大型生物学研究时,检测协调活动的模块是基础。对于二维数据(例如基因×患者),通常通过聚类或双聚类来完成。最近,对随时间监测患者的研究增加了另一个维度。在这种情况下,分析变得更加具有挑战性,特别是当时间测量不同步时。因此,需要新的能够分析三向数据的方法。
我们提出了一种在三向数据中查找一致且灵活模块的新算法。我们的方法可以识别出出现在多个患者中的核心模块,以及这些核心模块的患者特异性增强模块,其中包含其他基因。我们的算法基于分层贝叶斯数据模型和吉布斯抽样。该算法在模拟和真实数据上均优于现有方法。该方法成功地从基因表达的时间序列测量中剖析了败血症休克反应的关键组成部分。检测到的患者特异性模块增强对于疾病结局具有信息意义。在分析静息状态下受试者的脑功能磁共振成像时间序列时,它检测到了相关的脑区。
R 代码和数据可在 http://acgt.cs.tau.ac.il/twigs/ 获得。