Wolock C J, Gilbert P B, Simon N, Carone M
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 432 Guardian Drive, Philadelphia, Pennsylvania 19104, USA.
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue North, PO Box 19024, Seattle, Washington 98109, USA.
Biometrika. 2024 Nov 4;112(2):asae061. doi: 10.1093/biomet/asae061. eCollection 2025.
Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.
对于可纳入预测模型的一组特征,量化手头预测任务中一个特征子集的相对重要性可能会很有意义。例如,在HIV疫苗试验中,参与者的基线特征被用于预测在预定随访期内感染HIV的概率,研究人员可能希望了解某些类型的预测因素,如行为因素,对整体预测能力有多大贡献。诸如感染HIV的时间等事件发生时间的结果往往受到右删失的影响,而现有的评估变量重要性的方法通常不打算用于这种情况。我们描述了一类广泛的与算法无关的变量重要性度量,用于生存数据背景下的预测。我们提出了一种非参数有效估计程序,该程序结合了对干扰参数的灵活学习,产生渐近有效的推断,并具有双重稳健性。我们通过数值模拟评估了我们提出的程序的性能,并分析了HVTN 702疫苗试验的数据,以为未来HIV疫苗试验的入组策略提供信息。