Breslow Norman E, Amorim Gustavo, Pettinger Mary B, Rossouw Jacques
Department of Biostatistics, University of Washington, Seattle, WA, USA, Tel.: +1-206-543-1044.
Department of Statistics, University of Auckland, Auckland, NZ.
Stat Biosci. 2013 Nov 1;5(2). doi: 10.1007/s12561-013-9080-2.
Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women's Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods.
对嵌套在大型队列中的病例对照研究数据进行的标准分析,忽略了未被选入子研究抽样的队列成员的可用信息。本文回顾了几种旨在通过使用更多数据来提高估计效率的方法,将病例对照样本视为两阶段或三阶段分层样本。当应用于妇女健康倡议激素试验中女性冠心病的研究时,根据分析中使用的队列信息数量,观察到回归系数精度有适度但不断增加的提高。对于伪似然估计或最大似然估计,这种提高尤为明显,其有效性取决于所假定的模型是否正确。对于通过逆概率加权法估计的系数,得到的标准误更大,而逆概率加权法对模型误设更具稳健性。这种误设可能是使用加权法与更有效方法估计的一个关键回归系数存在重要差异的原因。