Advanced Analytics Division, SAS Institute Inc., Cary, NC 27513, Molecular and Genetic Epidemiology Section, Epidemiology Branch and Laboratory of Molecular Carcinogenesis, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709 and Department of Statistical Science, Duke University, Durham, NC 27708.
Bioinformatics. 2014 Jun 1;30(11):1562-8. doi: 10.1093/bioinformatics/btu040. Epub 2014 Feb 5.
Estimating a phenotype distribution conditional on a set of discrete-valued predictors is a commonly encountered task. For example, interest may be in how the density of a quantitative trait varies with single nucleotide polymorphisms and patient characteristics. The subset of important predictors is not usually known in advance. This becomes more challenging with a high-dimensional predictor set when there is the possibility of interaction.
We demonstrate a novel non-parametric Bayes method based on a tensor factorization of predictor-dependent weights for Gaussian kernels. The method uses multistage predictor selection for dimension reduction, providing succinct models for the phenotype distribution. The resulting conditional density morphs flexibly with the selected predictors. In a simulation study and an application to molecular epidemiology data, we demonstrate advantages over commonly used methods.
根据一组离散值预测器来估计表型分布是一项常见的任务。例如,人们可能会关注定量性状的密度如何随单核苷酸多态性和患者特征而变化。重要预测器的子集通常是事先不知道的。当存在相互作用的可能性时,对于高维预测器集,这变得更加具有挑战性。
我们展示了一种新的基于张量分解的非参数贝叶斯方法,该方法对高斯核的预测器相关权重进行张量分解。该方法使用多阶段预测器选择进行降维,为表型分布提供简洁的模型。所得到的条件密度随着选定的预测器而灵活变化。在模拟研究和对分子流行病学数据的应用中,我们证明了该方法优于常用方法的优势。