Breslow Norman E, Lumley Thomas, Ballantyne Christie M, Chambless Lloyd E, Kulich Michal
Department of Biostatistics, University of Washington, Seattle, Washington 98195-7232, USA.
Am J Epidemiol. 2009 Jun 1;169(11):1398-405. doi: 10.1093/aje/kwp055. Epub 2009 Apr 8.
Case-cohort data analyses often ignore valuable information on cohort members not sampled as cases or controls. The Atherosclerosis Risk in Communities (ARIC) study investigators, for example, typically report data for just the 10%-15% of subjects sampled for substudies of their cohort of 15,972 participants. Remaining subjects contribute to stratified sampling weights only. Analysis methods implemented in the freely available R statistical system (http://cran.r-project.org/) make better use of the data through adjustment of the sampling weights via calibration or estimation. By reanalyzing data from an ARIC study of coronary heart disease and simulations based on data from the National Wilms Tumor Study, the authors demonstrate that such adjustment can dramatically improve the precision of hazard ratios estimated for baseline covariates known for all subjects. Adjustment can also improve precision for partially missing covariates, those known for substudy participants only, when their values may be imputed with reasonable accuracy for the remaining cohort members. Links are provided to software, data sets, and tutorials showing in detail the steps needed to carry out the adjusted analyses. Epidemiologists are encouraged to consider use of these methods to enhance the accuracy of results reported from case-cohort analyses.
病例队列数据分析通常会忽略未被抽样作为病例或对照的队列成员的宝贵信息。例如,社区动脉粥样硬化风险(ARIC)研究的调查人员通常仅报告其15972名参与者队列中为子研究抽样的10%-15%受试者的数据。其余受试者仅用于分层抽样权重计算。免费的R统计系统(http://cran.r-project.org/)中实施的分析方法通过校准或估计调整抽样权重,从而更好地利用数据。通过重新分析ARIC冠心病研究的数据以及基于国家肾母细胞瘤研究数据进行的模拟,作者证明这种调整可以显著提高对所有受试者已知的基线协变量估计的风险比的精度。当部分缺失的协变量(仅子研究参与者已知)的值可以合理准确地推算其余队列成员时,调整也可以提高其精度。文中提供了软件、数据集和教程的链接,详细展示了进行调整分析所需的步骤。鼓励流行病学家考虑使用这些方法来提高病例队列分析报告结果的准确性。