Massachusetts General Hospital, Boston, MA, USA.
Harvard Medical School, Boston, MA, USA.
Stat Methods Med Res. 2024 Aug;33(8):1412-1423. doi: 10.1177/09622802241262523. Epub 2024 Jul 25.
An important task in health research is to characterize time-to-event outcomes such as disease onset or mortality in terms of a potentially high-dimensional set of risk factors. For example, prospective cohort studies of Alzheimer's disease (AD) typically enroll older adults for observation over several decades to assess the long-term impact of genetic and other factors on cognitive decline and mortality. The accelerated failure time model is particularly well-suited to such studies, structuring covariate effects as "horizontal" changes to the survival quantiles that conceptually reflect shifts in the outcome distribution due to lifelong exposures. However, this modeling task is complicated by the enrollment of adults at differing ages, and intermittent follow-up visits leading to interval-censored outcome information. Moreover, genetic and clinical risk factors are not only high-dimensional, but characterized by underlying grouping structures, such as by function or gene location. Such grouped high-dimensional covariates require shrinkage methods that directly acknowledge this structure to facilitate variable selection and estimation. In this paper, we address these considerations directly by proposing a Bayesian accelerated failure time model with a group-structured lasso penalty, designed for left-truncated and interval-censored time-to-event data. We develop an R package with a Markov chain Monte Carlo sampler for estimation. We present a simulation study examining the performance of this method relative to an ordinary lasso penalty and apply the proposed method to identify groups of predictive genetic and clinical risk factors for AD in the Religious Orders Study and Memory and Aging Project prospective cohort studies of AD and dementia.
在健康研究中,一个重要的任务是根据潜在的高维风险因素集来描述疾病发病或死亡率等事件时间结果。例如,阿尔茨海默病(AD)的前瞻性队列研究通常招募老年人进行几十年的观察,以评估遗传和其他因素对认知能力下降和死亡率的长期影响。加速失效时间模型特别适合于这类研究,将协变量的影响构建为生存分位数的“水平”变化,这些变化从概念上反映了由于终生暴露而导致的结果分布的变化。然而,这种建模任务由于成年人在不同年龄入组,以及间歇性的随访访问导致区间删失的结果信息而变得复杂。此外,遗传和临床风险因素不仅具有高维性,而且还具有潜在的分组结构,例如按功能或基因位置分组。这种分组的高维协变量需要收缩方法,直接承认这种结构,以促进变量选择和估计。在本文中,我们通过提出一种具有组结构 lasso 惩罚的贝叶斯加速失效时间模型来直接解决这些问题,该模型专为左截断和区间删失的事件时间数据设计。我们开发了一个带有马尔可夫链蒙特卡罗抽样器的 R 包用于估计。我们进行了一项模拟研究,考察了该方法相对于普通 lasso 惩罚的性能,并将所提出的方法应用于识别 AD 的宗教秩序研究和记忆与衰老项目前瞻性 AD 和痴呆队列研究中的预测遗传和临床风险因素的分组。