Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
Bioinformatics. 2013 Jan 15;29(2):206-14. doi: 10.1093/bioinformatics/bts669. Epub 2012 Nov 22.
Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited statistical power. At the same time, confounding influences, such as population structure, cause spurious association signals that result in false-positive findings.
We propose linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters; it effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker-based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual single nucleotide polymorphism effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine.
Code available under http://webdav.tuebingen. mpg.de/u/karsten/Forschung/research.html.
rakitsch@tuebingen.mpg.de, ippert@microsoft.com or stegle@ebi.ac.uk
Supplementary data are available at Bioinformatics online.
探索可遗传性状的遗传基础仍然是生物医学研究的核心挑战之一。在具有简单孟德尔结构的性状中,单一多态性位点解释了表型可变性的很大一部分。然而,许多感兴趣的性状似乎受到遗传位点群的多因素控制。准确检测这种多变量关联并非微不足道,并且经常受到统计能力有限的影响。同时,混杂的影响,如群体结构,导致虚假的关联信号,从而导致假阳性发现。
我们提出了线性混合模型 LMM-Lasso,这是一种允许多基因座映射和校正混杂影响的混合模型。我们的方法简单,无需调整参数;它有效地控制了群体结构,并扩展到全基因组数据集。LMM-Lasso 同时发现可能的因果变异,并允许基于多标记的基因型表型预测。我们在拟南芥全基因组关联研究和小鼠连锁映射中展示了 LMM-Lasso 的实际应用,在这两种方法中,我们的方法对 91%的考虑表型实现了显著更准确的表型预测。同时,我们的模型将表型可变性分解为由单个单核苷酸多态性效应和群体结构引起的成分。已知候选基因的富集表明,LMM-Lasso 检索到的个体关联很可能是真实的。
代码可在 http://webdav.tuebingen.mpg.de/u/karsten/Forschung/research.html 下获得。
rakitsch@tuebingen.mpg.de,ippert@microsoft.com 或 stegle@ebi.ac.uk
补充数据可在 Bioinformatics 在线获得。