Dahl Andrew, Iotchkova Valentina, Baud Amelie, Johansson Åsa, Gyllensten Ulf, Soranzo Nicole, Mott Richard, Kranis Andreas, Marchini Jonathan
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
Nat Genet. 2016 Apr;48(4):466-72. doi: 10.1038/ng.3513. Epub 2016 Feb 22.
Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.
基因关联研究已经产生了大量生物学发现。然而,这些研究大多一次只分析一个性状和一个单核苷酸多态性(SNP),因此未能捕捉到数据集潜在的复杂性。对复杂的高维数据集进行联合基因型-表型分析是超越简单全基因组关联研究(GWAS)的重要途径,具有巨大潜力。转向高维表型会引发许多新的统计问题。在此,我们解决样本间具有任何亲缘关系水平的研究中缺失表型这一核心问题。我们提出了一种多表型混合模型,并使用计算效率高的变分贝叶斯算法来拟合该模型。在来自一系列生物体和性状类型的各种模拟和真实数据集上,我们表明我们的方法优于统计学和机器学习文献中现有的最先进方法,并且可以增强关联信号。