Yeung Kar-Fu, Yang Yi, Yang Can, Liu Jin
Centre for Quantitative Medicine, Programme in Health Services and System Research, Duke-NUS Medical School, Singapore.
Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China.
Bioinform Biol Insights. 2019 Oct 13;13:1177932219881435. doi: 10.1177/1177932219881435. eCollection 2019.
Genome-wide association study (GWAS) analyses have identified thousands of associations between genetic variants and complex traits. However, it is still a challenge to uncover the mechanisms underlying the association. With the growing availability of transcriptome data sets, it has become possible to perform statistical analyses targeted at identifying influential genes whose expression levels correlate with the phenotype. Methods such as PrediXcan and transcriptome-wide association study (TWAS) use the transcriptome data set to fit a predictive model for gene expression, with genetic variants as covariates. The gene expression levels for the GWAS data set are then 'imputed' using the prediction model, and the imputed expression levels are tested for their association with the phenotype. These methods fail to account for the uncertainty in the GWAS imputation step, and we propose a collaborative mixed model (CoMM) that addresses this limitation by jointly modelling the multiple analysis steps. We illustrate CoMM's ability to identify relevant genes in the Northern Finland Birth Cohort 1966 data set and extend the model to handle the more widely available GWAS summary statistics.
全基因组关联研究(GWAS)分析已经确定了数千个基因变异与复杂性状之间的关联。然而,揭示这些关联背后的机制仍然是一项挑战。随着转录组数据集的日益丰富,进行针对性的统计分析以识别其表达水平与表型相关的有影响力的基因已成为可能。诸如PrediXcan和转录组全关联研究(TWAS)等方法利用转录组数据集来拟合基因表达的预测模型,将基因变异作为协变量。然后使用预测模型对GWAS数据集的基因表达水平进行“估算”,并检验估算的表达水平与表型之间的关联。这些方法没有考虑GWAS估算步骤中的不确定性,我们提出了一种协作混合模型(CoMM),通过对多个分析步骤进行联合建模来解决这一局限性。我们展示了CoMM在1966年芬兰北部出生队列数据集中识别相关基因的能力,并扩展了该模型以处理更广泛可用的GWAS汇总统计数据。