Division of Clinical Pharmacology, Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America.
PLoS One. 2012;7(8):e43301. doi: 10.1371/journal.pone.0043301. Epub 2012 Aug 14.
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.
人类 mRNA 基因表达调控背后的遗传变异可能为人类特征和复杂疾病的分子机制提供重要的见解。当前用于绘制与 mRNA 基因表达相关的遗传变异的统计方法通常应用标准连锁和/或关联方法;然而,当全基因组 SNP 和 mRNA 表达数据可用时,执行所有成对比较在计算上是繁琐的,并且可能无法提供最佳的关联检测能力。考虑到不同的方法来解决高维性和多重检验问题可能会提高效率和统计能力。在这里,我们提出了一种新的方法,用于在基因集 (GS) 和途径的背景下对遗传变异与 mRNA 基因表达水平之间的关联进行建模和检验,称为基因集 - 表达数量性状基因座分析 (GS-eQTL)。该方法使用 GS 最初对 SNP 和 mRNA 表达进行分组,然后应用主成分分析 (PCA) 来压缩变异并降低 GS 内的维数。我们应用 GS-eQTL 来评估来自基于细胞的模型系统的 SNP 和 mRNA 表达水平数据与 PharmGKB 和 KEGG 定义的 GS 之间的关联。我们观察到大量显著的 GS-eQTL 关联,其中最显著的关联出现在来自同一 GS 的遗传变异和 mRNA 表达之间。然而,还确定了一些涉及来自不同 GS 的遗传变异和 mRNA 表达的关联。我们提出的 GS-eQTL 方法有效地解决了 eQTL 研究中的多重检验限制,并为 SNP 表达关联提供了生物学背景。