基于两阶段分层样本的模型参数的改进霍维茨 - 汤普森估计：在流行病学中的应用

Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology.

作者信息

Breslow Norman E, Lumley Thomas, Ballantyne Christie M, Chambless Lloyd E, Kulich Michal

机构信息

Department of Biostatistics, University of Washington, Seattle, WA, USA, Tel.: +1-206-543-2035.

出版信息

Stat Biosci. 2009 May 1;1(1):32. doi: 10.1007/s12561-009-9001-6.

DOI:10.1007/s12561-009-9001-6

PMID:20174455

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2822363/

Abstract

The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.

摘要

病例队列研究涉及两阶段抽样

第一阶段从无限超总体中进行简单随机抽样，第二阶段从有限队列中进行分层随机抽样。病例队列数据的标准分析涉及求解逆概率加权（IPW）估计方程，权重由已知的第二阶段抽样比例确定。（半）参数模型（包括Cox模型）中参数估计的方差是两项之和：（i）如果整个队列有完整数据时通常估计的基于模型的方差；（ii）来自有效影响函数（IF）贡献的未知队列总数的IPW估计的基于设计的方差。可以通过调整抽样权重来减少第二个方差分量，要么通过校准与IF贡献相关的辅助变量的已知队列总数，要么通过使用这些相同的辅助变量对其进行估计。这两种调整方法都在R调查包中实现。我们推导了使用调整权重估计的系数的极限定律。渐近结果提出了构建辅助变量的实用方法，这些方法通过对国家肾母细胞瘤研究的病例队列样本进行模拟以及对社区动脉粥样硬化风险研究的病例队列数据进行对数线性建模来评估。尽管不是半参数有效的，但基于调整权重的估计器可能在增强IPW估计器类中接近实现完全效率。

相似文献

Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology.基于两阶段分层样本的模型参数的改进霍维茨 - 汤普森估计：在流行病学中的应用

Stat Biosci. 2009 May 1;1(1):32. doi: 10.1007/s12561-009-9001-6.

Optimal sampling for design-based estimators of regression models.基于设计的回归模型估计量的最优抽样。

Stat Med. 2022 Apr 15;41(8):1482-1497. doi: 10.1002/sim.9300. Epub 2022 Jan 6.

Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation.具有缺失协变量的半参数加速失效时间模型中的估计：通过增广提高效率。

J Am Stat Assoc. 2017;112(519):1221-1235. doi: 10.1080/01621459.2016.1205500. Epub 2017 Apr 25.

Analysis of two-phase sampling data with semiparametric additive hazards models.使用半参数加法风险模型对两阶段抽样数据进行分析。

Lifetime Data Anal. 2017 Jul;23(3):377-399. doi: 10.1007/s10985-016-9363-2. Epub 2016 Mar 19.

Comparing Parametric, Nonparametric, and Semiparametric Estimators: The Weibull Trials.比较参数、非参数和半参数估计器：威布尔试验。

Am J Epidemiol. 2021 Aug 1;190(8):1643-1651. doi: 10.1093/aje/kwab024.

WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING.两阶段抽样下的加权似然估计

Ann Stat. 2013 Feb 1;41(1):269-295. doi: 10.1214/12-AOS1073.

Z-estimation and stratified samples: application to survival models.Z估计与分层样本：在生存模型中的应用

Lifetime Data Anal. 2015 Oct;21(4):493-516. doi: 10.1007/s10985-014-9317-5. Epub 2015 Jan 15.

Connections between survey calibration estimators and semiparametric models for incomplete data.调查校准估计量与不完全数据半参数模型之间的联系。

Int Stat Rev. 2011 Aug;79(2):200-220. doi: 10.1111/j.1751-5823.2011.00138.x.

CONTROL FUNCTION ASSISTED IPW ESTIMATION WITH A SECONDARY OUTCOME IN CASE-CONTROL STUDIES.病例对照研究中具有次要结局的控制功能辅助逆概率加权估计

Stat Sin. 2017 Apr;27(2):785-804. doi: 10.5705/ss.202015.0116.

Model misspecification and bias for inverse probability weighting estimators of average causal effects.模型误设定和平均因果效应逆概率加权估计的偏差。

Biom J. 2023 Feb;65(2):e2100118. doi: 10.1002/bimj.202100118. Epub 2022 Aug 31.

引用本文的文献

Ascertainment Conditional Maximum Likelihood for Continuous Outcome Under Two-Phase Response-Selective Design.两阶段反应选择设计下连续结局的确定条件最大似然法

Stat Med. 2025 Jul;44(15-17):e70111. doi: 10.1002/sim.70111.

Software Application Profile: CaseCohortCoxSurvival-an R package for case-cohort inference for relative hazard and pure risk under the Cox model.软件应用简介：CaseCohortCoxSurvival——一个用于在Cox模型下进行病例队列推断以计算相对风险和纯风险的R包。

Int J Epidemiol. 2025 Feb 16;54(2). doi: 10.1093/ije/dyaf016.

Efficient risk-based collection of biospecimens in cohort studies: designing a prospective study of diagnostic performance for multicancer detection tests.队列研究中基于风险的生物样本高效收集：设计一项多癌检测试验诊断性能的前瞻性研究。

Am J Epidemiol. 2025 Jan 8;194(1):243-253. doi: 10.1093/aje/kwae139.

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data.分层加权病例-队列数据中相对危险度和纯粹风险的 Cox 模型推断。

Lifetime Data Anal. 2024 Jul;30(3):572-599. doi: 10.1007/s10985-024-09621-2. Epub 2024 Apr 2.

Weight calibration in the joint modelling of medical cost and mortality.医疗费用和死亡率联合建模中的体重校准。

Stat Methods Med Res. 2024 Apr;33(4):728-742. doi: 10.1177/09622802241236935. Epub 2024 Mar 6.

Three-phase generalized raking and multiple imputation estimators to address error-prone data.三阶段广义耙式和多重插补估计器解决易错数据。

Stat Med. 2024 Jan 30;43(2):379-394. doi: 10.1002/sim.9967. Epub 2023 Nov 21.

Use of nonsteroidal anti-inflammatory drugs and poor olfaction in women.非甾体抗炎药的使用与女性嗅觉差有关。

Int Forum Allergy Rhinol. 2024 Mar;14(3):639-650. doi: 10.1002/alr.23241. Epub 2023 Aug 7.

Vitamin D Metabolites and Risk of Cardiovascular Disease in Chronic Kidney Disease: The CRIC Study.维生素 D 代谢物与慢性肾脏病患者心血管疾病风险：CRIC 研究。

J Am Heart Assoc. 2023 Jul 18;12(14):e028561. doi: 10.1161/JAHA.122.028561. Epub 2023 Jul 8.

Estimation of conditional cumulative incidence functions under generalized semiparametric regression models with missing covariates, with application to analysis of biomarker correlates in vaccine trials.广义半参数回归模型下缺失协变量时条件累积发病率函数的估计及其在疫苗试验生物标志物相关性分析中的应用

Can J Stat. 2023 Mar;51(1):235-257. doi: 10.1002/cjs.11693. Epub 2022 Feb 24.

Optimal sampling allocation for outcome-dependent designs in cluster-correlated data settings.在聚类相关数据环境下，基于结局相关设计的最优抽样分配。

Stat Methods Med Res. 2022 Dec;31(12):2400-2414. doi: 10.1177/09622802221122423. Epub 2022 Aug 30.

本文引用的文献

A GENERAL ASYMPTOTIC THEORY FOR MAXIMUM LIKELIHOOD ESTIMATION IN SEMIPARAMETRIC REGRESSION MODELS WITH CENSORED DATA.带删失数据的半参数回归模型中极大似然估计的一般渐近理论

Stat Sin. 2010 Apr;20(2):871-910.

A Z-theorem with Estimated Nuisance Parameters and Correction Note for 'Weighted Likelihood for Semiparametric Models and Two-phase Stratified Samples, with Application to Cox Regression'.一个带有估计干扰参数的Z定理以及对“半参数模型和两阶段分层样本的加权似然性及其在Cox回归中的应用”的修正说明

Scand Stat Theory Appl. 2008 Mar 1;35(1):186-192. doi: 10.1111/j.1467-9469.2007.00574.x.

Using the whole cohort in the analysis of case-cohort data.在病例队列数据分析中使用整个队列。

Am J Epidemiol. 2009 Jun 1;169(11):1398-405. doi: 10.1093/aje/kwp055. Epub 2009 Apr 8.

The epidemiology of Lp-PLA(2): distribution and correlation with cardiovascular risk factors in a population-based cohort.脂蛋白磷脂酶A2的流行病学：基于人群队列中的分布及其与心血管危险因素的相关性

Atherosclerosis. 2007 Feb;190(2):388-96. doi: 10.1016/j.atherosclerosis.2006.02.016. Epub 2006 Mar 13.

Lipoprotein-associated phospholipase A2, high-sensitivity C-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the Atherosclerosis Risk in Communities (ARIC) study.在社区动脉粥样硬化风险（ARIC）研究中，脂蛋白相关磷脂酶A2、高敏C反应蛋白与中年男性和女性冠心病发病风险的关系

Circulation. 2004 Feb 24;109(7):837-42. doi: 10.1161/01.CIR.0000116763.91992.F1. Epub 2004 Feb 2.

Augmented inverse probability weighted estimator for Cox missing covariate regression.用于Cox缺失协变量回归的增强逆概率加权估计器。

Biometrics. 2001 Jun;57(2):414-9. doi: 10.1111/j.0006-341x.2001.00414.x.

Exposure stratified case-cohort designs.暴露分层病例队列设计。

Lifetime Data Anal. 2000 Mar;6(1):39-58. doi: 10.1023/a:1009661900674.

Analysis of case-cohort designs.病例队列设计分析。

J Clin Epidemiol. 1999 Dec;52(12):1165-72. doi: 10.1016/s0895-4356(99)00102-x.

Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms' tumor: a report from the National Wilms' Tumor Study Group.放线菌素D和阿霉素单剂量与分剂量给药治疗肾母细胞瘤患者的比较：来自国家肾母细胞瘤研究组的报告

J Clin Oncol. 1998 Jan;16(1):237-45. doi: 10.1200/JCO.1998.16.1.237.

Robust variance estimation for the case-cohort design.病例队列设计的稳健方差估计

Biometrics. 1994 Dec;50(4):1064-72.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验