Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield, Department of Medicine, University of Oxford and Department of Statistics, University of Oxford, Oxford, OX3 7LF, UK.
Novartis Pharma AG, CH-4056 Basel, Switzerland.
Biostatistics. 2024 Jul 1;25(3):681-701. doi: 10.1093/biostatistics/kxad012.
Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.
现有的协变量存在时拟合连续时间马尔可夫模型 (CTMM) 的方法由于为每个观测值计算矩阵指数的计算成本高而存在可扩展性问题。在本文中,我们提出了一种 CTMM 的优化技术,该技术使用随机梯度下降算法结合使用 Padé 逼近对矩阵指数进行微分。这种方法使得拟合大规模数据成为可能。我们提出了两种计算标准误差的方法,一种是使用 Padé 展开的新方法,另一种是使用矩阵指数的幂级数展开的方法。通过模拟,我们发现相对于现有的 CTMM 方法有了改进的性能,并且我们在大规模多发性硬化症 NO.MS 数据集上演示了该方法。