Wang Shikun, Ning Jing, Xu Ying, Shih Ya-Chen Tina, Shen Y U, Li Liang
Department of Biostatistics, The University of Texas MD Anderson Cancer Center.
Department of Health Services Research, The University of Texas MD Anderson Cancer Center.
Ann Appl Stat. 2023 Mar;17(1):881-899. Epub 2023 Jan 24.
Insurance claims data is an increasingly important health policy research resource, given its longitudinal assessment of cancer care clinical outcomes. Population-level information on medical cost trajectory from disease diagnosis to terminal events, such as death, specifically interests policy makers. Estimating the mean cost trajectory has statistical challenges. The shape of the trajectory is usually highly nonlinear with varying durations, depending on the diagnosis-to-death population time distribution. The terminal event may be right censored, resulting in missing subsequent costs. Medical costs often have skewed distributions with zero-inflation and heteroscedasticity, which may not fit well with the commonly used parametric family of distributions. In this paper, we propose a flexible semi-parametric model to address challenges without imposing a cost data distributional assumption. The estimation procedure is based on generalized estimating equations with censored covariates. The proposed model adopts a bivariate surface that quantifies the interrelationship between longitudinal medical costs and survival, and results in the nonlinear population mean cost trajectory conditional on the death time. We develop a novel generalized estimating equations algorithm to accommodate covariates subject to right-censoring, without fully specifying the joint distribution of the cost and survival data. We provide theoretical and simulation-based justification for the proposed approach, and apply the methods to estimate prostate cancer patient cost trajectories from the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked database.
鉴于保险理赔数据对癌症护理临床结果的纵向评估,它已成为一种日益重要的卫生政策研究资源。从疾病诊断到诸如死亡等终末事件的医疗费用轨迹的人群层面信息,尤其引起政策制定者的兴趣。估计平均费用轨迹存在统计学挑战。轨迹的形状通常高度非线性且持续时间各异,这取决于从诊断到死亡的人群时间分布。终末事件可能存在右删失,导致后续费用缺失。医疗费用往往具有零膨胀和异方差的偏态分布,这可能与常用的参数分布族不太契合。在本文中,我们提出一种灵活的半参数模型来应对这些挑战,而无需对费用数据的分布做出假设。估计过程基于带有删失协变量的广义估计方程。所提出的模型采用一个双变量曲面来量化纵向医疗费用与生存之间的相互关系,并得出以死亡时间为条件的非线性人群平均费用轨迹。我们开发了一种新颖的广义估计方程算法,以适应存在右删失的协变量,而无需完全指定费用和生存数据的联合分布。我们为所提出的方法提供了理论和基于模拟的依据,并将这些方法应用于从监测、流行病学和最终结果(SEER)-医疗保险链接数据库中估计前列腺癌患者的费用轨迹。