适用于大量竞争风险数据的可扩展算法

Scalable Algorithms for Large Competing Risks Data.

作者信息

Kawaguchi Eric S, Shen Jenny I, Suchard Marc A, Li Gang

机构信息

Department of Preventive Medicine, University of Southern California.

Division of Nephrology and Hypertension Los Angeles Biomedical Institute at Harbor-UCLA Medical Center.

出版信息

J Comput Graph Stat. 2021;30(3):685-693. doi: 10.1080/10618600.2020.1841650. Epub 2020 Dec 11.

DOI:10.1080/10618600.2020.1841650

PMID:35983577

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9385160/

Abstract

This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate -based iteratively reweighted -penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic update of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted -penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the computation costs of the log-pseudo likelihood and its derivatives for PSH model grow at the rate of ( ) with the sample size in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to (). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yields > 1, 000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and a United States Renal Data System data. Supplementary materials for this article are available online.

摘要

本文针对竞争风险事件发生时间数据的可扩展稀疏回归做出了两个正交贡献。首先，我们在Fine-Gray（1999）比例子分布风险（PSH）模型的背景下，研究并加速了间断自适应岭方法（BAR），这是一种基于代理的迭代加权惩罚算法，在其极限情况下实现稀疏性。具体而言，我们推导了一种用于BAR回归的新算法，称为cycBAR，它使用显式阈值公式对每个坐标进行循环更新。新的cycBAR算法有效地避免了拟合多个重新加权的惩罚，因此比原始的BAR算法有显著的加速。其次，我们解决了与拟合PSH模型相关的一个关键计算问题。具体来说，在当前实现中，PSH模型的对数伪似然及其导数的计算成本随样本量n以O(n²)的速度增长。我们提出了一种新颖的前向-后向扫描算法，将计算成本降低到O(n)。所提出的方法适用于PSH模型的无惩罚和惩罚估计，并且比当前实现有显著的加速。最后，将这两种算法结合起来，相对于原始的BAR算法可以实现超过1000倍的加速。使用模拟和美国肾脏数据系统的数据给出了我们提出的算法对于大型竞争风险数据令人印象深刻的可扩展性的示例。本文的补充材料可在线获取。

相似文献

Scalable Algorithms for Large Competing Risks Data.适用于大量竞争风险数据的可扩展算法

J Comput Graph Stat. 2021;30(3):685-693. doi: 10.1080/10618600.2020.1841650. Epub 2020 Dec 11.

Broken adaptive ridge regression and its asymptotic properties.折断自适应岭回归及其渐近性质。

J Multivar Anal. 2018 Nov;168:334-351. doi: 10.1016/j.jmva.2018.08.007. Epub 2018 Aug 23.

A surrogate ℓ sparse Cox's regression with applications to sparse high-dimensional massive sample size time-to-event data.带代理 ℓ 稀疏 Cox 回归及其在稀疏高维大规模生存时间数据中的应用。

Stat Med. 2020 Mar 15;39(6):675-686. doi: 10.1002/sim.8438. Epub 2019 Dec 8.

Penalized variable selection in competing risks regression.竞争风险回归中的惩罚变量选择

Lifetime Data Anal. 2017 Jul;23(3):353-376. doi: 10.1007/s10985-016-9362-3. Epub 2016 Mar 26.

Fast Lasso-type safe screening for Fine-Gray competing risks model with ultrahigh dimensional covariates.基于超高维协变量的 Fine-Gray 竞争风险模型的快速 Lasso 型安全筛选

Stat Med. 2022 Oct 30;41(24):4941-4960. doi: 10.1002/sim.9545. Epub 2022 Aug 9.

High-dimensional feature selection in competing risks modeling: A stable approach using a split-and-merge ensemble algorithm.竞争风险模型中的高维特征选择：一种使用分裂-合并集成算法的稳定方法。

Biom J. 2023 Feb;65(2):e2100164. doi: 10.1002/bimj.202100164. Epub 2022 Aug 7.

Simultaneous Estimation and Variable Selection for Interval-Censored Data with Broken Adaptive Ridge Regression.基于折断自适应岭回归的区间删失数据的同步估计与变量选择

J Am Stat Assoc. 2020;115(529):204-216. doi: 10.1080/01621459.2018.1537922. Epub 2019 Apr 22.

Regularized Weighted Nonparametric Likelihood Approach for High-Dimension Sparse Subdistribution Hazards Model for Competing Risk Data.正则化加权非参数似然法在高维稀疏亚分布风险模型中的应用。

Comput Math Methods Med. 2021 Sep 19;2021:5169052. doi: 10.1155/2021/5169052. eCollection 2021.

Fast iteratively reweighted least squares algorithms for analysis-based sparse reconstruction.基于分析的稀疏重建的快速迭代重加权最小二乘法算法。

Med Image Anal. 2018 Oct;49:141-152. doi: 10.1016/j.media.2018.08.002. Epub 2018 Aug 7.

Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA) for Lp-Regularization Using the Multiple Sub-Dictionary Representation.基于多子字典表示的Lp正则化稀疏自适应迭代加权阈值算法（SAITA）

Sensors (Basel). 2017 Dec 15;17(12):2920. doi: 10.3390/s17122920.

引用本文的文献

Massive Parallelization of Massive Sample-size Survival Analysis.大规模样本量生存分析的大规模并行化

J Comput Graph Stat. 2024;33(1):289-302. doi: 10.1080/10618600.2023.2213279. Epub 2023 Jun 26.

Fitting the Cox proportional hazards model to big data.对大数据拟合 Cox 比例风险模型。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae018.

Sure Joint Screening for High Dimensional Cox's Proportional Hazards Model Under the Case-Cohort Design.基于病例-队列设计的高维 Cox 比例风险模型的联合筛选

J Comput Biol. 2023 Jun;30(6):663-677. doi: 10.1089/cmb.2022.0416. Epub 2023 May 3.

Gene Screening for Prognosis of Non-Muscle-Invasive Bladder Carcinoma under Competing Risks Endpoints.竞争风险终点下非肌层浸润性膀胱癌预后的基因筛查

Cancers (Basel). 2023 Jan 6;15(2):379. doi: 10.3390/cancers15020379.

Fast Lasso-type safe screening for Fine-Gray competing risks model with ultrahigh dimensional covariates.基于超高维协变量的 Fine-Gray 竞争风险模型的快速 Lasso 型安全筛选

Stat Med. 2022 Oct 30;41(24):4941-4960. doi: 10.1002/sim.9545. Epub 2022 Aug 9.

A New -Regularized Log-Linear Poisson Graphical Model with Applications to RNA Sequencing Data.一种新的正则化对数线性泊松图模型及其在 RNA 测序数据中的应用。

本文引用的文献

Variable selection for recurrent event data with broken adaptive ridge regression.基于折线段自适应岭回归的复发事件数据变量选择

Can J Stat. 2018 Sep;46(3):416-428. doi: 10.1002/cjs.11459. Epub 2018 Aug 10.

Simultaneous Estimation and Variable Selection for Interval-Censored Data with Broken Adaptive Ridge Regression.基于折断自适应岭回归的区间删失数据的同步估计与变量选择

J Am Stat Assoc. 2020;115(529):204-216. doi: 10.1080/01621459.2018.1537922. Epub 2019 Apr 22.

Stat Med. 2020 Mar 15;39(6):675-686. doi: 10.1002/sim.8438. Epub 2019 Dec 8.

A fast divide-and-conquer sparse Cox regression.快速分治稀疏 Cox 回归。

Biostatistics. 2021 Apr 10;22(2):381-401. doi: 10.1093/biostatistics/kxz036.

Broken adaptive ridge regression and its asymptotic properties.折断自适应岭回归及其渐近性质。

J Multivar Anal. 2018 Nov;168:334-351. doi: 10.1016/j.jmva.2018.08.007. Epub 2018 Aug 23.

Improving reproducibility by using high-throughput observational studies with empirical calibration.通过使用经实证校准的高通量观察性研究提高可重复性。

Philos Trans A Math Phys Eng Sci. 2018 Sep 13;376(2128). doi: 10.1098/rsta.2017.0356.

High-dimensional variable selection and prediction under competing risks with application to SEER-Medicare linked data.高维变量选择和竞争风险下的预测及其在 SEER-Medicare 关联数据中的应用。

Stat Med. 2018 Oct 30;37(24):3486-3502. doi: 10.1002/sim.7822. Epub 2018 May 29.

Association of Race and Ethnicity With Live Donor Kidney Transplantation in the United States From 1995 to 2014.1995年至2014年美国种族与活体供肾移植的关联

JAMA. 2018 Jan 2;319(1):49-61. doi: 10.1001/jama.2017.19152.

Group and within-group variable selection for competing risks data.竞争风险数据的组内和组间变量选择

Lifetime Data Anal. 2018 Jul;24(3):407-424. doi: 10.1007/s10985-017-9400-9. Epub 2017 Aug 4.

Differential impact of smoking on mortality and kidney transplantation among adult Men and Women undergoing dialysis.吸烟对接受透析的成年男性和女性死亡率及肾移植的不同影响。

BMC Nephrol. 2016 Jul 26;17:95. doi: 10.1186/s12882-016-0311-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验