Cheng Wei, Shi Yu, Zhang Xiang, Wang Wei
Department of Computer Science, UNC at Chapel Hill, 201 S Columbia St., Chapel Hill, 27599, NC, USA.
Computer Science at the University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, 61801, IL, USA.
BMC Bioinformatics. 2015 Jan 16;16:2. doi: 10.1186/s12859-014-0421-z.
Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. The traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to hidden biological pathways.
We introduce a new approach to identify novel group-wise associations between sets of SNPs and sets of genes. Such associations are captured by hidden variables connecting SNPs and genes. Our model is a linear-Gaussian model and uses two types of hidden variables. One captures the set associations between SNPs and genes, and the other captures confounders. We develop an efficient optimization procedure which makes this approach suitable for large scale studies. Extensive experimental evaluations on both simulated and real datasets demonstrate that the proposed methods can effectively capture both individual and group-wise signals that cannot be identified by the state-of-the-art eQTL mapping methods.
Considering group-wise associations significantly improves the accuracy of eQTL mapping, and the successful multi-layer regression model opens a new approach to understand how multiple SNPs interact with each other to jointly affect the expression level of a group of genes.
全基因组表达定量性状位点(eQTL)研究已成为理解基因表达和复杂性状遗传基础的有力工具。传统的eQTL方法侧重于测试单个单核苷酸多态性(SNP)与基因表达性状之间的关联。这种方法的一个主要缺点是它无法模拟一组SNP对一组基因的联合效应,而这可能对应于隐藏的生物学途径。
我们引入了一种新方法来识别SNP集与基因集之间新的分组关联。这种关联由连接SNP和基因的隐藏变量捕获。我们的模型是一个线性高斯模型,并使用两种类型的隐藏变量。一种捕获SNP与基因之间的集关联,另一种捕获混杂因素。我们开发了一种有效的优化程序,使该方法适用于大规模研究。对模拟数据集和真实数据集的广泛实验评估表明,所提出的方法可以有效地捕获现有eQTL定位方法无法识别的个体和分组信号。
考虑分组关联显著提高了eQTL定位的准确性,成功的多层回归模型为理解多个SNP如何相互作用以共同影响一组基因的表达水平开辟了一条新途径。