Kawaguchi Eric S, Shen Jenny I, Suchard Marc A, Li Gang
Department of Preventive Medicine, University of Southern California.
Division of Nephrology and Hypertension Los Angeles Biomedical Institute at Harbor-UCLA Medical Center.
J Comput Graph Stat. 2021;30(3):685-693. doi: 10.1080/10618600.2020.1841650. Epub 2020 Dec 11.
This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate -based iteratively reweighted -penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted -penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of ( ) with the sample size in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to (). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.
本文针对竞争风险事件发生时间数据的可扩展稀疏回归做出了两个正交贡献。首先,我们在Fine-Gray(1999)比例子分布风险(PSH)模型的背景下,研究并加速了间断自适应岭方法(BAR),这是一种基于代理的迭代加权惩罚算法,在其极限情况下实现稀疏性。具体而言,我们推导了一种用于BAR回归的新算法,称为cycBAR,它使用显式阈值公式对每个坐标进行循环更新。新的cycBAR算法有效地避免了拟合多个重新加权的惩罚,因此比原始的BAR算法有显著的加速。其次,我们解决了与拟合PSH模型相关的一个关键计算问题。具体来说,在当前实现中,PSH模型的对数伪似然及其导数的计算成本随样本量n以O(n²)的速度增长。我们提出了一种新颖的前向-后向扫描算法,将计算成本降低到O(n)。所提出的方法适用于PSH模型的无惩罚和惩罚估计,并且比当前实现有显著的加速。最后,将这两种算法结合起来,相对于原始的BAR算法可以实现超过1000倍的加速。使用模拟和美国肾脏数据系统的数据给出了我们提出的算法对于大型竞争风险数据令人印象深刻的可扩展性的示例。本文的补充材料可在线获取。