School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China.
Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore.
Bioinformatics. 2020 Apr 1;36(7):2009-2016. doi: 10.1093/bioinformatics/btz880.
Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.
In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data.
The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.
Supplementary data are available at Bioinformatics online.
尽管全基因组关联研究(GWAS)加深了我们对复杂性状遗传结构的理解,但遗传变异如何导致复杂性状的机制联系仍然难以捉摸。为了深入了解潜在的机制联系,各个联盟收集了大量基因组数据,使我们能够研究遗传变异在基因表达调控中的作用。最近,提出了一种协同混合模型(CoMM),通过整合 GWAS 数据集和表达数量性状基因座(eQTL)数据集,共同研究复杂性状的基因组。虽然 CoMM 是一种强大的方法,利用调控信息,同时考虑使用 eQTL 数据集的不确定性,但它需要个体水平的 GWAS 数据,并且不能充分利用广泛可用的 GWAS 汇总统计信息。因此,需要利用仅从 GWAS 数据中汇总统计信息的转录组信息的统计上有效的方法。
在这项研究中,我们提出了一种新的概率模型 CoMM-S2,通过仅使用 GWAS 汇总统计信息而不是个体水平的 GWAS 数据,来检查遗传变异所起的机制作用。与使用个体水平的 GWAS 数据的 CoMM 相似,CoMM-S2 结合了两个模型:第一个模型检查基因表达与基因型之间的关系,第二个模型检查表型与第一个模型中预测的基因表达之间的关系。与 CoMM 不同,CoMM-S2 仅需要 GWAS 汇总统计信息。通过模拟研究和真实数据分析,我们证明了即使 CoMM-S2 使用 GWAS 汇总统计信息,它的性能也与使用个体水平的 GWAS 数据的 CoMM 相当。
CoMM-S2 的实现包含在可从 https://github.com/gordonliu810822/CoMM 下载的 CoMM 包中。
补充数据可在 Bioinformatics 在线获取。