Department of Statistics, The University of Auckland, Auckland, New Zealand.
Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan.
Stat Med. 2020 Apr 30;39(9):1311-1327. doi: 10.1002/sim.8477. Epub 2020 Jan 27.
Linear mixed models (LMMs) and their extensions have been widely used for high-dimensional genomic data analyses. While LMMs hold great promise for risk prediction research, the high dimensionality of the data and different effect sizes of genomic regions bring great analytical and computational challenges. In this work, we present a multikernel linear mixed model with adaptive lasso (KLMM-AL) to predict phenotypes using high-dimensional genomic data. We develop two algorithms for estimating parameters from our model and also establish the asymptotic properties of LMM with adaptive lasso when only one dependent observation is available. The proposed KLMM-AL can account for heterogeneous effect sizes from different genomic regions, capture both additive and nonadditive genetic effects, and adaptively and efficiently select predictive genomic regions and their corresponding effects. Through simulation studies, we demonstrate that KLMM-AL outperforms most of existing methods. Moreover, KLMM-AL achieves high sensitivity and specificity of selecting predictive genomic regions. KLMM-AL is further illustrated by an application to the sequencing dataset obtained from the Alzheimer's disease neuroimaging initiative.
线性混合模型(LMMs)及其扩展已广泛应用于高维基因组数据分析。虽然 LMMs 在风险预测研究中具有很大的潜力,但数据的高维性和基因组区域的不同效应大小带来了巨大的分析和计算挑战。在这项工作中,我们提出了一种具有自适应套索的多核线性混合模型(KLMM-AL),用于使用高维基因组数据预测表型。我们开发了两种从模型中估计参数的算法,并在仅存在一个依赖观测值的情况下建立了自适应套索的 LMM 的渐近性质。所提出的 KLMM-AL 可以考虑来自不同基因组区域的异质效应大小,捕获加性和非加性遗传效应,并自适应和有效地选择预测性基因组区域及其相应的效应。通过模拟研究,我们证明 KLMM-AL 优于大多数现有方法。此外,KLMM-AL 实现了选择预测性基因组区域的高灵敏度和特异性。KLMM-AL 进一步通过对来自阿尔茨海默病神经影像学倡议的测序数据集的应用来说明。