Fusi Nicolo, Lippert Christoph, Lawrence Neil D, Stegle Oliver
eScience Group, Microsoft Research, Los Angeles, California 90024, USA.
Department of Computer Science, University of Sheffield, Sheffield S10 2HQ, UK.
Nat Commun. 2014 Sep 19;5:4890. doi: 10.1038/ncomms5890.
Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.
线性混合模型(LMMs)是研究基因型与表型关系的一种强大且成熟的工具。LMM的一个局限性在于该模型假设残差呈高斯分布,而这一要求在实际中很少成立。违背这一假设可能会导致错误的结论和功效损失。为了缓解这个问题,常见的做法是对表型值进行预处理,使其尽可能呈高斯分布,例如通过应用对数或其他非线性变换。不幸的是,不同的表型需要不同的变换,而选择合适的变换具有挑战性且主观。在此,我们提出了LMM的一种扩展,它能从观测数据中估计出最优变换。在对人类、小鼠和酵母的真实数据进行的模拟和应用中,我们表明使用我们的模型推断出的变换可提高全基因组关联研究的功效,并提高遗传力估计和表型预测的准确性。