Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland.
Advanced Methodology and Data Science, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA.
Stat Med. 2021 Nov 10;40(25):5453-5473. doi: 10.1002/sim.9134. Epub 2021 Jul 30.
One of the key challenges of personalized medicine is to identify which patients will respond positively to a given treatment. The area of subgroup identification focuses on this challenge, that is, identifying groups of patients that experience desirable characteristics, such as an enhanced treatment effect. A crucial first step towards the subgroup identification is to identify the baseline variables (eg, biomarkers) that influence the treatment effect, which are known as predictive variables. Many subgroup discovery algorithms return importance scores that capture the variables' predictive strength. However, a major limitation of these scores is that they do not answer the core question: "Which variables are actually predictive?" With our work we answer this question by using the knockoff framework, which is a general framework for controlling the false discovery rate when performing prognostic variable selection. In contrast, our work is the first that uses knockoffs for predictive variable selection. We introduce two novel knockoff filters: one parametric, building on variable importance scores derived from a penalized linear regression model, and one non-parametric, building on causal forest variable importance scores. We conduct extensive simulations to validate performance of the proposed methodology and we also apply the proposed methods to data from a randomized clinical trial.
个性化医学的主要挑战之一是确定哪些患者对特定治疗会有积极反应。亚组识别领域专注于解决这一挑战,即识别出具有理想特征(如增强治疗效果)的患者群体。亚组识别的关键第一步是确定影响治疗效果的基线变量(例如生物标志物),这些变量被称为预测变量。许多亚组发现算法会返回重要性得分,这些得分捕捉了变量的预测强度。然而,这些得分的一个主要限制是,它们没有回答核心问题:“哪些变量实际上具有预测性?”我们的工作通过使用 knockoff 框架回答了这个问题,该框架是在进行预后变量选择时控制假发现率的通用框架。相比之下,我们的工作是第一个将 knockoff 用于预测变量选择的工作。我们引入了两种新颖的 knockoff 滤波器:一种是参数滤波器,基于来自惩罚线性回归模型的变量重要性得分;另一种是非参数滤波器,基于因果森林变量重要性得分。我们进行了广泛的模拟来验证所提出方法的性能,并且还将所提出的方法应用于随机临床试验的数据。