Department of Biostatistics, University of California at Los Angeles, Los Angeles, CA, USA.
Department of Medicine, University of California at Los Angeles, Los Angeles, CA, USA.
Comput Math Methods Med. 2022 Feb 8;2022:1362913. doi: 10.1155/2022/1362913. eCollection 2022.
Semiparametric joint models of longitudinal and competing risk data are computationally costly, and their current implementations do not scale well to massive biobank data. This paper identifies and addresses some key computational barriers in a semiparametric joint model for longitudinal and competing risk survival data. By developing and implementing customized linear scan algorithms, we reduce the computational complexities from ( ) or ( ) to () in various steps including numerical integration, risk set calculation, and standard error estimation, where is the number of subjects. Using both simulated and real-world biobank data, we demonstrate that these linear scan algorithms can speed up the existing methods by a factor of up to hundreds of thousands when > 10, often reducing the runtime from days to minutes. We have developed an R package, FastJM, based on the proposed algorithms for joint modeling of longitudinal and competing risk time-to-event data and made it publicly available on the Comprehensive R Archive Network (CRAN).
半参数纵向和竞争风险数据联合模型的计算成本很高,并且它们的当前实现无法很好地扩展到大规模生物库数据。本文确定并解决了纵向和竞争风险生存数据的半参数联合模型中的一些关键计算障碍。通过开发和实施定制的线性扫描算法,我们将各种步骤(包括数值积分、风险集计算和标准误差估计)的计算复杂度从 ( ) 或 ( ) 降低到 (),其中 是受试者的数量。使用模拟和真实生物库数据,我们证明当 > 10 时,这些线性扫描算法可以将现有方法的速度提高数十倍甚至数百倍,通常将运行时间从几天缩短到几分钟。我们已经基于所提出的算法开发了一个用于纵向和竞争风险时间到事件数据联合建模的 R 包 FastJM,并在 Comprehensive R Archive Network (CRAN) 上公开发布。