Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
Incyte Corporation, Wilmington, Delaware, USA.
Stat Med. 2021 Jun 15;40(13):3181-3195. doi: 10.1002/sim.8972. Epub 2021 Apr 5.
In cancer studies, it is important to understand disease heterogeneity among patients so that precision medicine can particularly target high-risk patients at the right time. Many feature variables such as demographic variables and biomarkers, combined with a patient's survival outcome, can be used to infer such latent heterogeneity. In this work, we propose a mixture model to model each patient's latent survival pattern, where the mixing probabilities for latent groups are modeled through a multinomial distribution. The Bayesian information criterion is used for selecting the number of latent groups. Furthermore, we incorporate variable selection with the adaptive lasso into inference so that only a few feature variables will be selected to characterize the latent heterogeneity. We show that our adaptive lasso estimator has oracle properties when the number of parameters diverges with the sample size. The finite sample performance is evaluated by the simulation study, and the proposed method is illustrated by two datasets.
在癌症研究中,了解患者之间的疾病异质性很重要,这样精准医学才能在正确的时间特别针对高危患者。许多特征变量,如人口统计学变量和生物标志物,结合患者的生存结果,可以用来推断这种潜在的异质性。在这项工作中,我们提出了一种混合模型来模拟每个患者的潜在生存模式,其中潜在组的混合概率通过多项分布进行建模。贝叶斯信息准则用于选择潜在组的数量。此外,我们将变量选择与自适应套索结合到推断中,以便仅选择少数特征变量来描述潜在的异质性。当参数数量随样本量发散时,我们证明了我们的自适应套索估计器具有 oracle 性质。通过模拟研究评估了有限样本性能,并通过两个数据集说明了所提出的方法。