双重截断和区间删失数据的非参数分析。

Nonparametric analysis of doubly truncated and interval-censored data.

机构信息

Department of Statistics, 198404Tunghai University, Taichung.

出版信息

Stat Methods Med Res. 2022 Jun;31(6):1157-1170. doi: 10.1177/09622802221084133. Epub 2022 Mar 23.

Abstract

In epidemiological studies, it is easier to collect data only from individuals whose failure events are within a calendar time interval, the so-called interval sampling, which leads to doubly truncated data. In many situations, the calendar time of the failure event can only be recorded within time intervals, leading to doubly truncated and interval censored (DTIC) data. Firstly, we point out that although the existing methods for DTIC data work adequately under the sampling scheme (Scheme 1) for doubly truncated data, Scheme 1 is not realistic for DTIC data. Secondly, we consider a commonly used sampling scheme (Scheme 2) , under which the individuals are included in the sample based on diagnosis date. We point out that under Scheme 2, due to violation of assumptions for Scheme 1, the NPMLE of the cumulative distribution function is severely biased if the likelihood function for Scheme 1 is used. To overcome this difficulty, we define a target population, under which a sampling scheme (Scheme 3) can be implemented such that appropriate truncation variables can be defined and the NPMLE of the cumulative distribution function can be obtained using the expectation-maximization algorithm. We also consider estimation of the joint distribution function for successive duration times. Using the imputed first failure times based on the NPMLE from Scheme 3, we then obtain the imputed right censored data of the second failure event. Based on the imputed data, we propose a nonparametric estimator of the joint distribution function using the inverse-probability-weighted approach. Simulation studies demonstrate that the proposed method performs well with moderate sample sizes.

摘要

在流行病学研究中，只收集那些失败事件发生在日历时间间隔内的个体的数据更容易，这种方法称为间隔采样，会导致双重截断数据。在许多情况下，失败事件的日历时间只能在时间间隔内记录，从而导致双重截断和区间删失（DTIC）数据。首先，我们指出，尽管现有的 DTIC 数据方法在双重截断数据的抽样方案（方案 1）下能充分发挥作用，但方案 1 并不适用于 DTIC 数据。其次，我们考虑了一种常用的抽样方案（方案 2），其中根据诊断日期将个体纳入样本。我们指出，在方案 2 下，由于违反了方案 1 的假设，如果使用方案 1 的似然函数，累积分布函数的 NPMLE 会严重偏倚。为了克服这一困难，我们定义了一个目标人群，在该人群下可以实施抽样方案（方案 3），从而可以定义适当的截断变量，并使用期望最大化算法获得累积分布函数的 NPMLE。我们还考虑了对连续持续时间的联合分布函数的估计。基于方案 3 中的 NPMLE 推断首次失败时间，我们可以得到第二个失败事件的右删失数据的推断值。基于推断的数据，我们使用逆概率加权法提出了联合分布函数的非参数估计方法。模拟研究表明，该方法在中等样本量下表现良好。