Division of Biostatistics, University of California, Berkeley, CA, USA.
Bioinformatics. 2012 Mar 15;28(6):815-22. doi: 10.1093/bioinformatics/bts038. Epub 2012 Jan 23.
Pathway genes are considered as a group of genes that work cooperatively in the same pathway constituting a fundamental functional grouping in a biological process. Identifying pathway genes has been one of the major tasks in understanding biological processes. However, due to the difficulty in characterizing/inferring different types of biological gene relationships, as well as several computational issues arising from dealing with high-dimensional biological data, deducing genes in pathways remain challenging.
In this work, we elucidate higher level gene-gene interactions by evaluating the conditional dependencies between genes, i.e. the relationships between genes after removing the influences of a set of previously known pathway genes. These previously known pathway genes serve as seed genes in our model and will guide the detection of other genes involved in the same pathway. The detailed statistical techniques involve the estimation of a precision matrix whose elements are known to be proportional to partial correlations (i.e. conditional dependencies) between genes under appropriate normality assumptions. Likelihood ratio tests on two forms of precision matrices are further performed to see if a candidate pathway gene is conditionally independent of all the previously known pathway genes. When used effectively, this is a promising approach to recover gene relationships that would have otherwise been missed by standard methods. The advantage of the proposed method is demonstrated using both simulation studies and real datasets. We also demonstrated the importance of taking into account experimental dependencies in the simulation and real data studies.
途径基因被认为是一组在同一途径中协同工作的基因,构成了生物过程中的基本功能分组。识别途径基因是理解生物过程的主要任务之一。然而,由于难以表征/推断不同类型的生物基因关系,以及处理高维生物数据时出现的几个计算问题,推断途径中的基因仍然具有挑战性。
在这项工作中,我们通过评估基因之间的条件依赖性(即在去除一组先前已知途径基因的影响后基因之间的关系)来阐明更高层次的基因-基因相互作用。这些先前已知的途径基因作为我们模型中的种子基因,将指导检测同一途径中涉及的其他基因。详细的统计技术涉及估计一个精度矩阵,其元素已知在适当的正态性假设下与基因之间的部分相关系数(即条件依赖性)成比例。进一步对两种形式的精度矩阵进行似然比检验,以确定候选途径基因是否与所有先前已知的途径基因条件独立。当有效使用时,这是一种很有前途的方法,可以恢复标准方法可能遗漏的基因关系。使用模拟研究和真实数据集证明了所提出方法的优势。我们还在模拟和真实数据研究中证明了考虑实验依赖性的重要性。