Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada.
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Stat Methods Med Res. 2019 Oct-Nov;28(10-11):3333-3345. doi: 10.1177/0962280218804275. Epub 2018 Oct 8.
It is frequently of interest to estimate the time that individuals survive with a disease, that is, to estimate the time between disease onset and occurrence of a clinical endpoint such as death. Epidemiologic survival data are commonly collected from either an incident cohort, whose members' disease onset occurs after the study baseline date, or from a cohort with prevalent disease that is followed forward in time. Incident cohort survival data are limited by study termination, while prevalent cohort data provide biased (left-truncated) survival data. In this article, we investigate the advantages of a study design featuring simultaneous follow-up of prevalent and incident cohorts to the estimation of the survivor function. Our analyses are supported by simulations and illustrated using data on survival after myotonic dystrophy diagnosis from the United Kingdom Clinical Practice Research Datalink (CPRD). We demonstrate that the NPMLE using combined incident and prevalent cohort data estimates the true survivor function very well, even for moderate sample sizes, and ameliorates the disadvantages of using a purely incident or prevalent cohort.
人们经常有兴趣估计个体患有疾病的生存时间,也就是说,估计疾病发作和临床终点(如死亡)发生之间的时间。流行病学生存数据通常从发病后研究基线日期的起始队列或随时间向前随访的已有疾病队列中收集。起始队列生存数据受研究终止的限制,而现有队列数据提供有偏(左截断)生存数据。在本文中,我们研究了同时随访起始和现有队列的研究设计在估计生存函数方面的优势。我们的分析得到了模拟的支持,并使用来自英国临床实践研究数据链(CPRD)的肌强直性营养不良诊断后生存数据进行了说明。我们证明,使用合并的起始和现有队列数据的 NPMLE 可以很好地估计真实的生存函数,即使对于中等样本量也是如此,并改善了使用纯起始或现有队列的缺点。