Department of Statistics, Beijing Normal University, Zhuhai, 519000, China.
Department of Statistics, Florida State University, Tallahassee, Florida 32312, United States.
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae103.
We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.
我们解决了在混合线性回归的背景下,在高维环境中估计回归系数和选择相关预测因子的挑战,其中预测因子的数量远远超过样本量。这一领域的最新进展集中在将稀疏诱导惩罚纳入期望最大化(EM)算法中,该算法旨在最大化响应给定预测因子的条件似然。然而,现有的程序通常将预测因子视为固定的,或者忽略了它们固有的可变性。在本文中,我们利用混合物的预测因子和潜在指示变量之间的独立性来促进高效计算,并在所有混合物成分中实现协同变量选择。我们建立了所提出的快速分组惩罚 EM 估计量对真实回归参数的非渐近收敛速度。通过广泛的模拟和对癌症细胞系百科全书数据集的应用,我们证明了我们的方法在预测抗癌药物敏感性方面的有效性。