Geskus R B
Municipal Health Service, Division of Public Health and Environment, Nieuwe Achtergracht 100, 1018 WT, Amsterdam, The Netherlands.
Stat Med. 2001 Mar 15;20(5):795-812. doi: 10.1002/sim.700.
In most cohort studies on HIV infection and AIDS, data on time from seroconversion to AIDS or death are doubly censored, both at the time origin and at the endpoint of interest. In epidemiological research, the most frequently adopted approach is to restrict the analysis to persons with narrow seroconversion intervals and to impute the midpoint of this interval as date of seroconversion. For many cohort studies, the consequence is that a substantial proportion of the data is not used. We consider four methods that are expected to be less biased when all cohort data are used: two imputation methods, conditional mean and multiple imputation, and two likelihood maximization methods. We derive the likelihood structure of the cohort data and clarify its dependence on study design. All methods are applied to data from the Amsterdam cohort study among injection drug users. In a simulation study the data generation process of this cohort study is imitated. The performance of midpoint, conditional mean and multiple imputation are compared. With midpoint imputation, both an analysis using the full data set, as well as one restricted to the cases with small seroconversion intervals, is performed. Conditional mean imputation comes out as the preferred method. It gives best results with respect to mean squared error. Moreover, when confidence intervals are computed through standard methods that ignore the uncertainty in the imputed date of seroconversion, coverage probabilities are almost correct.
在大多数关于艾滋病毒感染和艾滋病的队列研究中,从血清转化到艾滋病或死亡的时间数据在时间起点和感兴趣的终点都受到双重截尾。在流行病学研究中,最常用的方法是将分析限制在血清转化间隔较窄的人群,并将该间隔的中点作为血清转化日期进行推算。对于许多队列研究来说,结果是相当一部分数据未被使用。我们考虑了四种方法,当使用所有队列数据时,预计这些方法的偏差较小:两种推算方法,即条件均值和多重推算,以及两种似然最大化方法。我们推导了队列数据的似然结构,并阐明了其对研究设计的依赖性。所有方法都应用于阿姆斯特丹注射吸毒者队列研究的数据。在一项模拟研究中,模仿了该队列研究的数据生成过程。比较了中点、条件均值和多重推算的性能。对于中点推算,既进行了使用完整数据集的分析,也进行了仅限于血清转化间隔较小的病例的分析。条件均值推算被证明是首选方法。就均方误差而言,它给出了最佳结果。此外,当通过忽略推算的血清转化日期中的不确定性的标准方法计算置信区间时,覆盖概率几乎是正确的。