Suppr超能文献

多个预测因素与生存结局之间最大关联的有效估计

EFFICIENT ESTIMATION OF THE MAXIMAL ASSOCIATION BETWEEN MULTIPLE PREDICTORS AND A SURVIVAL OUTCOME.

作者信息

Huang Tzu-Jung, Luedtke Alex, McKeague Ian W

机构信息

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center.

Department of Statistics, University of Washington.

出版信息

Ann Stat. 2023 Oct;51(5):1965-1988. doi: 10.1214/23-aos2313. Epub 2023 Dec 14.

Abstract

This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.

摘要

本文提出了一种新的方法,用于对生存结果的高维预测变量进行筛选后的推断。文献中已经研究了对右删失结果数据的筛选后推断,但要使这些方法在高维情况下既可靠又具有计算可扩展性,仍有许多工作要做。机器学习工具通常用于提供生存结果的预测,但除非考虑到选择过程,否则所选预测变量的估计效应会受到确认偏差的影响。新方法涉及构建预测变量与生存结果之间线性关联的半参数有效估计量,这些估计量用于构建一个检验统计量,以检测任何预测变量与结果之间是否存在关联。此外,一种类似于装袋法的稳定化技术可以对所得检验统计量进行正态校准,这使得能够构建预测变量与结果之间最大关联的置信区间,并且还大大降低了计算成本。理论结果表明,即使预测变量的数量随样本量呈超多项式增长,这种检验过程也是有效的,并且我们的模拟在中等样本量下支持了这种渐近保证。新方法被应用于识别与抗病毒药物效力相关的病毒基因表达模式的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb71/10888526/3671ee7f9fb6/nihms-1923285-f0001.jpg

相似文献

3
Testing for Marginal Linear Effects in Quantile Regression.分位数回归中的边际线性效应检验。
J R Stat Soc Series B Stat Methodol. 2018 Mar;80(2):433-452. doi: 10.1111/rssb.12258. Epub 2017 Oct 23.
6
Distributed Inference for Spatial Extremes Modeling in High Dimensions.高维空间极值建模的分布式推理
J Am Stat Assoc. 2024;119(546):1297-1308. doi: 10.1080/01621459.2023.2186886. Epub 2023 Apr 13.

引用本文的文献

本文引用的文献

1
Testing and Confidence Intervals for High Dimensional Proportional Hazards Model.高维比例风险模型的检验与置信区间
J R Stat Soc Series B Stat Methodol. 2017 Nov;79(5):1415-1437. doi: 10.1111/rssb.12224. Epub 2016 Dec 26.
3
Comment: Models as (deliberate) approximations.评论:作为(有意为之的)近似值的模型。
Stat Sci. 2019 Nov;34(4):591-598. doi: 10.1214/19-STS747. Epub 2020 Jan 8.
5
A Generic Sure Independence Screening Procedure.一种通用的确定独立筛选程序。
J Am Stat Assoc. 2019;114(526):928-937. doi: 10.1080/01621459.2018.1462709. Epub 2018 Aug 6.
8
Prediction of VRC01 neutralization sensitivity by HIV-1 gp160 sequence features.通过 HIV-1 gp160 序列特征预测 VRC01 中和敏感性。
PLoS Comput Biol. 2019 Apr 1;15(4):e1006952. doi: 10.1371/journal.pcbi.1006952. eCollection 2019 Apr.
9
Post-Selection Inference for -Penalized Likelihood Models.用于惩罚似然模型的选择后推断
Can J Stat. 2018 Mar;46(1):41-61. doi: 10.1002/cjs.11313. Epub 2017 Mar 6.
10
Parametric-rate inference for one-sided differentiable parameters.单侧可微参数的参数速率推断。
J Am Stat Assoc. 2018;113(522):780-788. doi: 10.1080/01621459.2017.1285777. Epub 2017 Feb 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验