Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany.
BMC Med Inform Decis Mak. 2021 Dec 7;21(1):342. doi: 10.1186/s12911-021-01698-1.
An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that are truly associated with a clinical outcome such as a time-to-event endpoint. In clinical practice, this task is challenging since patient cohorts are typically small and can be heterogeneous with regard to their relationship between predictors and outcome. When data of several subgroups of patients with the same or similar disease are available, it is tempting to combine them to increase sample size, such as in multicenter studies. However, heterogeneity between subgroups can lead to biased results and subgroup-specific effects may remain undetected.
For this situation, we propose a penalized Cox regression model with a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. The weights are estimated from the data such that patients who are likely to belong to the subgroup of interest obtain higher weights in the subgroup-specific model.
Our proposed approach is evaluated through simulations and application to real lung cancer cohorts, and compared to existing approaches. Simulation results demonstrate that our proposed model is superior to standard approaches in terms of prediction performance and variable selection accuracy when the sample size is small.
The results suggest that sharing information between subgroups by incorporating appropriate weights into the likelihood can increase power to identify the prognostic covariates and improve risk prediction.
在临床医学中,一项重要任务是基于高维分子测量(如基因表达数据)为特定患者亚组构建风险预测模型。建模高维数据的主要目标是实现良好的预测性能和特征选择,以找到与临床结局(如事件时间终点)真正相关的预测因子子集。在临床实践中,由于患者队列通常较小,并且在预测因子与结局之间的关系方面可能存在异质性,因此这项任务颇具挑战性。当具有相同或相似疾病的多个患者亚组的数据可用时,将它们合并以增加样本量(如在多中心研究中)是很诱人的。然而,亚组之间的异质性可能导致有偏结果,并且亚组特异性效应可能仍然未被发现。
针对这种情况,我们提出了一种带有加权 Cox 部分似然的惩罚性 Cox 回归模型,该模型包含所有亚组的患者,但根据其亚组归属为每个患者分配单独的权重。权重是根据数据估计的,使得可能属于感兴趣亚组的患者在亚组特异性模型中获得更高的权重。
我们通过模拟和对真实肺癌队列的应用评估了所提出的方法,并与现有方法进行了比较。模拟结果表明,当样本量较小时,与标准方法相比,我们提出的模型在预测性能和变量选择准确性方面具有优势。
结果表明,通过将适当的权重纳入似然函数来在亚组之间共享信息可以提高识别预后协变量的能力并改善风险预测。