Hartley Stephen W, Monti Stefano, Liu Ching-Ti, Steinberg Martin H, Sebastiani Paola
Department of Biostatistics, Boston University School of Public Health Boston, MA, USA.
Front Genet. 2012 Sep 11;3:176. doi: 10.3389/fgene.2012.00176. eCollection 2012.
Genome-wide association studies (GWAS) have identified numerous associations between genetic loci and individual phenotypes; however, relatively few GWAS have attempted to detect pleiotropic associations, in which loci are simultaneously associated with multiple distinct phenotypes. We show that pleiotropic associations can be directly modeled via the construction of simple Bayesian networks, and that these models can be applied to produce single or ensembles of Bayesian classifiers that leverage pleiotropy to improve genetic risk prediction. The proposed method includes two phases: (1) Bayesian model comparison, to identify Single-Nucleotide Polymorphisms (SNPs) associated with one or more traits; and (2) cross-validation feature selection, in which a final set of SNPs is selected to optimize prediction. To demonstrate the capabilities and limitations of the method, a total of 1600 case-control GWAS datasets with two dichotomous phenotypes were simulated under 16 scenarios, varying the association strengths of causal SNPs, the size of the discovery sets, the balance between cases and controls, and the number of pleiotropic causal SNPs. Across the 16 scenarios, prediction accuracy varied from 90 to 50%. In the 14 scenarios that included pleiotropically associated SNPs, the pleiotropic model search and prediction methods consistently outperformed the naive model search and prediction. In the two scenarios in which there were no true pleiotropic SNPs, the differences between the pleiotropic and naive model searches were minimal. To further evaluate the method on real data, a discovery set of 1071 sickle cell disease (SCD) patients was used to search for pleiotropic associations between cerebral vascular accidents and fetal hemoglobin level. Classification was performed on a smaller validation set of 352 SCD patients, and showed that the inclusion of pleiotropic SNPs may slightly improve prediction, although the difference was not statistically significant. The proposed method is robust, computationally efficient, and provides a powerful new approach for detecting and modeling pleiotropic disease loci.
全基因组关联研究(GWAS)已经确定了众多基因位点与个体表型之间的关联;然而,相对较少的GWAS尝试检测多效性关联,即基因位点同时与多种不同表型相关联。我们表明,多效性关联可以通过构建简单的贝叶斯网络直接建模,并且这些模型可以用于生成利用多效性来改善遗传风险预测的单个或集成贝叶斯分类器。所提出的方法包括两个阶段:(1)贝叶斯模型比较,以识别与一个或多个性状相关的单核苷酸多态性(SNP);(2)交叉验证特征选择,其中选择一组最终的SNP以优化预测。为了证明该方法的能力和局限性,在16种情况下模拟了总共1600个具有两种二分表型的病例对照GWAS数据集,改变了因果SNP的关联强度、发现集的大小、病例与对照之间的平衡以及多效性因果SNP的数量。在这16种情况下,预测准确率从90%到50%不等。在包括多效性相关SNP的14种情况下,多效性模型搜索和预测方法始终优于朴素模型搜索和预测。在没有真正多效性SNP的两种情况下,多效性和朴素模型搜索之间的差异最小。为了在真实数据上进一步评估该方法,使用了1071例镰状细胞病(SCD)患者的发现集来搜索脑血管意外与胎儿血红蛋白水平之间的多效性关联。对352例SCD患者的较小验证集进行分类,结果表明纳入多效性SNP可能会略微改善预测,尽管差异没有统计学意义。所提出的方法稳健、计算效率高,并为检测和建模多效性疾病基因位点提供了一种强大的新方法。