Lock Eric F, Dunson David B
Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A.
Department of Statistical Science, Duke University, Durham, North Carolina 27708, U.S.A.
Biometrics. 2017 Sep;73(3):1018-1028. doi: 10.1111/biom.12649. Epub 2017 Jan 12.
High-throughput genetic and epigenetic data are often screened for associations with an observed phenotype. For example, one may wish to test hundreds of thousands of genetic variants, or DNA methylation sites, for an association with disease status. These genomic variables can naturally be grouped by the gene they encode, among other criteria. However, standard practice in such applications is independent screening with a universal correction for multiplicity. We propose a Bayesian approach in which the prior probability of an association for a given genomic variable depends on its gene, and the gene-specific probabilities are modeled nonparametrically. This hierarchical model allows for appropriate gene and genome-wide multiplicity adjustments, and can be incorporated into a variety of Bayesian association screening methodologies with negligible increase in computational complexity. We describe an application to screening for differences in DNA methylation between lower grade glioma and glioblastoma multiforme tumor samples from The Cancer Genome Atlas. Software is available via the package BayesianScreening for R: github.com/lockEF/BayesianScreening.
高通量遗传和表观遗传数据经常被筛选以寻找与观察到的表型之间的关联。例如,人们可能希望测试数十万种遗传变异或DNA甲基化位点与疾病状态的关联。这些基因组变量自然可以根据它们所编码的基因以及其他标准进行分组。然而,此类应用中的标准做法是采用通用的多重性校正进行独立筛选。我们提出一种贝叶斯方法,其中给定基因组变量关联的先验概率取决于其所在的基因,并且基因特异性概率采用非参数建模。这种层次模型允许进行适当的基因和全基因组多重性调整,并且可以纳入各种贝叶斯关联筛选方法中,而计算复杂度的增加可以忽略不计。我们描述了一个应用,用于筛选来自癌症基因组图谱的低级别胶质瘤和多形性胶质母细胞瘤肿瘤样本之间DNA甲基化的差异。可通过R包BayesianScreening获取软件:github.com/lockEF/BayesianScreening 。