He Baihua, Ma Shuangge, Zhang Xinyu, Zhu Li-Xing
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, China.
Department of Biostatistics, Yale University, New Haven, CT.
J Am Stat Assoc. 2023;118(544):2658-2670. doi: 10.1080/01621459.2022.2070070. Epub 2022 Jul 7.
Model averaging is an effective way to enhance prediction accuracy. However, most previous works focus on low-dimensional settings with completely observed responses. To attain an accurate prediction for the risk effect of survival data with high-dimensional predictors, we propose a novel method: rank-based greedy (RG) model averaging. Specifically, adopting the transformation model with splitting predictors as working models, we doubly use the smooth concordance index function to derive the candidate predictions and optimal model weights. The final prediction is achieved by weighted averaging all the candidates. Our approach is flexible, computationally efficient, and robust against model misspecification, as it neither requires the correctness of a joint model nor involves the estimation of the transformation function. We further adopt the greedy algorithm for high dimensions. Theoretically, we derive an asymptotic error bound for the optimal weights under some mild conditions. In addition, the summation of weights assigned to the correct candidate submodels is proven to approach one in probability when there are correct models included among the candidate submodels. Extensive numerical studies are carried out using both simulated and real datasets to show the proposed approach's robust performance compared to the existing regularization approaches. Supplementary materials for this article are available online.
模型平均是提高预测准确性的有效方法。然而,以前的大多数工作都集中在具有完全观测响应的低维设置上。为了对具有高维预测变量的生存数据的风险效应进行准确预测,我们提出了一种新方法:基于秩的贪婪(RG)模型平均。具体来说,采用将预测变量拆分的变换模型作为工作模型,我们双重使用平滑一致性指数函数来推导候选预测和最优模型权重。最终预测通过对所有候选进行加权平均来实现。我们的方法灵活、计算效率高且对模型误设具有鲁棒性,因为它既不需要联合模型的正确性,也不涉及变换函数的估计。我们进一步针对高维采用贪婪算法。从理论上讲,在一些温和条件下,我们推导了最优权重的渐近误差界。此外,当候选子模型中包含正确模型时,证明分配给正确候选子模型的权重之和依概率趋近于1。使用模拟和真实数据集进行了广泛的数值研究,以展示所提出的方法与现有正则化方法相比的稳健性能。本文的补充材料可在线获取。