Lee JungJun, Kim SungHwan, Jhong Jae-Hwan, Koo Ja-Yong
Department of Statistics, Korea University, Seoul 02841, Republic of Korea.
Department of Applied Statistics, Konkuk University, Seoul 05029, Republic of Korea.
Comput Math Methods Med. 2018 Jun 25;2018:4626307. doi: 10.1155/2018/4626307. eCollection 2018.
In genomic data analysis, it is commonplace that underlying regulatory relationship over multiple genes is hardly ascertained due to unknown genetic complexity and epigenetic regulations. In this paper, we consider a joint mean and constant covariance model (JMCCM) that elucidates conditional dependent structures of genes with controlling for potential genotype perturbations. To this end, the modified Cholesky decomposition is utilized to parametrize entries of a precision matrix. The JMCCM maximizes the likelihood function to estimate parameters involved in the model. We also develop a variable selection algorithm that selects explanatory variables and Cholesky factors by exploiting the combination of the GCV and BIC as benchmarks, together with Rao and Wald statistics. Importantly, we notice that sparse estimation of a precision matrix (or equivalently gene network) is effectively achieved via the proposed variable selection scheme and contributes to exploring significant hub genes shown to be concordant to biological evidence. In simulation studies, we confirm that our model selection efficiently identifies the true underlying networks. With an application to miRNA and SNPs data from yeast (a.k.a. eQTL data), we demonstrate that constructed gene networks reproduce validated biological and clinical knowledge with regard to various pathways including the cell cycle pathway.
在基因组数据分析中,由于未知的遗传复杂性和表观遗传调控,很难确定多个基因之间潜在的调控关系,这是很常见的情况。在本文中,我们考虑了一种联合均值和常数协方差模型(JMCCM),该模型通过控制潜在的基因型扰动来阐明基因的条件依赖结构。为此,利用修正的Cholesky分解对精度矩阵的元素进行参数化。JMCCM通过最大化似然函数来估计模型中涉及的参数。我们还开发了一种变量选择算法,该算法通过利用GCV和BIC的组合作为基准,结合Rao和Wald统计量来选择解释变量和Cholesky因子。重要的是,我们注意到通过所提出的变量选择方案可以有效地实现精度矩阵(或等效的基因网络)的稀疏估计,这有助于探索与生物学证据一致的重要枢纽基因。在模拟研究中,我们证实我们的模型选择能够有效地识别真实的潜在网络。通过将其应用于酵母的miRNA和SNP数据(也称为eQTL数据),我们证明构建的基因网络能够重现关于包括细胞周期途径在内的各种途径的经过验证的生物学和临床知识。