Province Michael A, Borecki Ingrid B
Division of Statistical Genomics, Box 8506, Center for Genome Sciences, Washington University School of Medicine, 4444 Forest Park Blvd, St. Louis, MO 63108, USA.
Pac Symp Biocomput. 2008:190-200.
Genomewide association scan (GWAS) data mining has found moderate-effect "gold nugget" complex trait genes. But for many traits, much of the explanatory variance may be truly polygenic, more like gold dust, whose small marginal effects are undetectable by traditional methods. Yet, their collective effects may be quite important in advancing personalized medicine. We consider a novel approach to sift out the genetic gold dust influencing quantitative (or qualitative) traits. Out of a GWAS, we randomly grab handfuls of SNPs, modeling their effects in a multiple linear (or logistic) regression. The model's significance is used to obtain an iteratively updated pseudo-Bayesian posterior probability associated with each SNP, which is repeated over many random draws until the distribution becomes stable. A stepwise procedure culls the list of SNPs to define the final set. Results from a benchmark simulation of 5 quantitative trait genes among 1,000, in 1,000 random subjects, are contrasted with marginal tests using nominal significance, Bonferroni-corrected significance, false discovery rates, as well as with serial selection methods. Random handfuls produced the best combination of sensitivity (0.95) specificity (0.99) and true positive rate (0.71) of all methods tested and better replicability in an independent subject set. From more extensive simulations, we determine which combinations of signal to noise ratios, SNP typing densities, and sample sizes are tractable with which methods to gather the gold dust.
全基因组关联扫描(GWAS)数据挖掘已发现具有中等效应的“金块”复杂性状基因。但对于许多性状而言,大部分可解释变异可能实际上是多基因的,更像是金粉,其微小的边际效应无法用传统方法检测到。然而,它们的共同效应在推进个性化医疗方面可能相当重要。我们考虑一种新颖的方法来筛选出影响数量(或质量)性状的遗传金粉。在GWAS中,我们随机抽取一把单核苷酸多态性(SNP),在多元线性(或逻辑)回归中对其效应进行建模。利用模型的显著性来获得与每个SNP相关的迭代更新的伪贝叶斯后验概率,在多次随机抽样中重复此过程,直到分布变得稳定。一个逐步的程序会筛选SNP列表以确定最终集合。对1000名随机受试者中5个数量性状基因的基准模拟结果,与使用名义显著性、邦费罗尼校正显著性、错误发现率的边际检验以及序列选择方法进行了对比。在所有测试方法中,随机抽取产生了最佳的灵敏度(0.95)、特异性(0.99)和真阳性率(0.71)组合,并且在独立受试者组中具有更好的可重复性。通过更广泛的模拟,我们确定了信噪比、SNP分型密度和样本量的哪些组合适用于哪些方法来收集金粉。