Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America.
PLoS One. 2011;6(6):e20764. doi: 10.1371/journal.pone.0020764. Epub 2011 Jun 8.
One difficult question facing researchers is how to prioritize SNPs detected from genetic association studies for functional studies. Often a list of the top M SNPs is determined based on solely the p-value from an association analysis, where M is determined by financial/time constraints. For many studies of complex diseases, multiple analyses have been completed and integrating these multiple sets of results may be difficult. One may also wish to incorporate biological knowledge, such as whether the SNP is in the exon of a gene or a regulatory region, into the selection of markers to follow-up. In this manuscript, we propose a Bayesian latent variable model (BLVM) for incorporating "features" about a SNP to estimate a latent "quality score", with SNPs prioritized based on the posterior probability distribution of the rankings of these quality scores. We illustrate the method using data from an ovarian cancer genome-wide association study (GWAS). In addition to the application of the BLVM to the ovarian GWAS, we applied the BLVM to simulated data which mimics the setting involving the prioritization of markers across multiple GWAS for related diseases/traits. The top ranked SNP by BLVM for the ovarian GWAS, ranked 2(nd) and 7(th) based on p-values from analyses of all invasive and invasive serous cases. The top SNP based on serous case analysis p-value (which ranked 197(th) for invasive case analysis), was ranked 8(th) based on the posterior probability of being in the top 5 markers (0.13). In summary, the application of the BLVM allows for the systematic integration of multiple SNP "features" for the prioritization of loci for fine-mapping or functional studies, taking into account the uncertainty in ranking.
研究人员面临的一个难题是如何为功能研究对从遗传关联研究中检测到的 SNP 进行优先级排序。通常,根据关联分析的 p 值确定前 M 个 SNP 的列表,其中 M 由财务/时间限制决定。对于许多复杂疾病的研究,已经完成了多项分析,整合这些多组结果可能很困难。人们可能还希望将生物学知识(例如 SNP 是否位于基因的外显子或调控区域)纳入后续标记的选择中。在本文中,我们提出了一种贝叶斯潜在变量模型 (BLVM),用于将 SNP 的“特征”纳入其中,以估计潜在的“质量分数”,并根据这些质量分数排名的后验概率分布对 SNP 进行优先级排序。我们使用卵巢癌全基因组关联研究 (GWAS) 的数据说明了该方法。除了将 BLVM 应用于卵巢 GWAS 之外,我们还将 BLVM 应用于模拟数据,该模拟数据模拟了在多个与相关疾病/特征相关的 GWAS 中对标记进行优先级排序的情况。BLVM 对卵巢 GWAS 的排名最高的 SNP 排名第 2(nd)和第 7(th),基于所有侵袭性和侵袭性浆液性病例分析的 p 值。基于浆液性病例分析 p 值的排名最高的 SNP(在侵袭性病例分析中排名第 197(th)),基于排在前 5 个标记的后验概率(0.13)排名第 8(th)。总之,BLVM 的应用允许对多个 SNP“特征”进行系统整合,以对精细映射或功能研究的基因座进行优先级排序,同时考虑到排名的不确定性。