Best Ana F, Wolfson David B
National Institutes of Health, National Cancer Institute, Division of Cancer Epidemiology and Genetics, Biostatistics Branch, 9609 Medical Center Drive, MSC 9776, Bethesda, MD 20892, U.S.A.
McGill University Department of Mathematics and Statistics, Burnside Hall Room 1005, 805 Sherbrooke Street West, Montreal Quebec, Canada H3A 0B9.
Can J Stat. 2017 Mar;45(1):4-28. doi: 10.1002/cjs.11311. Epub 2017 Feb 27.
The determination of risk factors for disease incidence has been the subject of much epidemiologic research. With this goal a common study design entails the follow-up of an initially disease-free cohort, keeping track of the dates of disease incidence (onset) and ascertaining covariate (putative risk factor) information on the full cohort. However, the collection of certain covariate information on all study subjects may be prohibitively expensive and, therefore, the nested case-control study has commonly been used. The high cost of full covariate information on all subjects also arises when determining risk factors for "failure," death say, "following" disease onset, in particular, in a prevalent cohort study with follow-up; in such a study a cohort of subjects with existing disease is followed. We here adapt nested case-control designs to the setting of a prevalent cohort study with follow-up, a topic previously not addressed in the literature. We provide the partial likelihood under risk set sampling and state the asymptotic properties of the estimated covariate effects and baseline cumulative hazard. We address the following design questions in the context of prevalent cohort studies with follow-up: How many subjects should be included in the sampled risk sets for efficient estimation? In what way is the proportion of censored subjects associated with the benefit of a nested case-control design? What proportion of overall variance is attributable to risk set sampling? This work is motivated by the anticipated analysis of data on survival with Parkinson's Disease, being collected as part of the ongoing Canadian Longitudinal Study on Aging.
确定疾病发病率的风险因素一直是许多流行病学研究的主题。为了实现这一目标,一种常见的研究设计是对最初无病的队列进行随访,记录疾病发病(发作)日期,并确定整个队列的协变量(假定风险因素)信息。然而,收集所有研究对象的某些协变量信息可能成本过高,因此,嵌套病例对照研究被广泛使用。在确定疾病发作后“失败”(如死亡)的风险因素时,特别是在有随访的现患队列研究中,收集所有受试者完整协变量信息的成本也很高;在这样的研究中,对患有现有疾病的队列进行随访。我们在此将嵌套病例对照设计应用于有随访的现患队列研究环境,这是一个此前文献中未涉及的主题。我们给出了风险集抽样下的部分似然,并阐述了估计协变量效应和基线累积风险的渐近性质。我们在有随访的现患队列研究背景下解决以下设计问题:为了进行有效估计,抽样风险集中应纳入多少受试者?被截尾的受试者比例与嵌套病例对照设计的益处有何关联?总体方差的多大比例可归因于风险集抽样?这项工作的动机来自于对帕金森病生存数据的预期分析,这些数据是作为正在进行的加拿大老龄化纵向研究的一部分收集的。