IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):712-718. doi: 10.1109/TCBB.2019.2901473. Epub 2019 Feb 25.
Identifying gene network rewiring under different biological conditions is important for understanding the mechanisms underlying complex diseases. Gaussian graphical models, which assume the data follow the multivariate normal distribution, are widely used to identify gene network rewiring. However, the normality assume often fails in reality since the data are contaminated by extreme outliers in general. In this study, we propose a new robust differential graphical model to identify gene network rewiring between two conditions based on the multivariate t-distribution. The multivariate t-distribution is more robust to outliers than the normal distribution since it has heavy tails and allows values far from the mean. A fused lasso penalty is used to borrow information across conditions to improve the results. We develop an expectation maximization algorithm to solve the optimization model. Experiment results on simulated data show that our method outperforms the state-of-the-art methods. Our method is also applied to identify gene network rewiring between luminal A and basal-like subtypes of breast cancer, and gene network rewiring between the proneural and mesenchymal subtypes of glioblastoma. Several key genes which drive gene network rewiring are discovered.
在不同的生物条件下识别基因网络重排对于理解复杂疾病的机制非常重要。高斯图形模型,假设数据遵循多元正态分布,被广泛用于识别基因网络重排。然而,由于数据通常受到极端异常值的污染,正态性假设在现实中常常失效。在这项研究中,我们提出了一种新的基于多元 t 分布的稳健差异图形模型,用于识别两种条件之间的基因网络重排。与正态分布相比,多元 t 分布对异常值更稳健,因为它具有重尾并且允许远离平均值的值。融合 lasso 惩罚被用来在条件之间借用信息以提高结果。我们开发了一种期望最大化算法来解决优化模型。在模拟数据上的实验结果表明,我们的方法优于最先进的方法。我们的方法还应用于识别乳腺癌腔 A 型和基底样亚型之间以及胶质母细胞瘤神经前体细胞型和间充质亚型之间的基因网络重排。发现了几个驱动基因网络重排的关键基因。