Choi Yoonha, Coram Marc, Peng Jie, Tang Hua
1 Department of Genetics, Stanford University , Stanford, California.
2 Department of Health Research and Policy, Stanford University , Stanford, California.
J Comput Biol. 2017 Jul;24(7):721-731. doi: 10.1089/cmb.2017.0053. Epub 2017 May 30.
Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.
利用转录组数据构建表达网络是研究基因调控的一种有效方法。构建此类网络的一种常用方法基于高斯图形模型(GGM),其中一对基因之间的边表示在所有其他基因的表达水平给定的情况下,这两个基因的表达水平是条件依赖的。然而,GGM不适用于非高斯数据,例如RNA测序实验中产生的数据。我们提出了一种新的统计框架,该框架最大化惩罚似然,其中观测到的计数数据遵循泊松对数正态分布。为了克服计算挑战,我们使用拉普拉斯方法来近似似然及其梯度,并应用乘子交替方向法来找到惩罚最大似然估计。使用模拟和真实RNA测序数据对所提出的方法进行评估并与GGM进行比较。所提出的方法在检测代表基因协变对的边方面表现出改进的性能,特别是对于连接低丰度基因的边和调控枢纽周围的边。