Dehghan Mohammad Hossein, Duchesne Thierry
Department of Mathematics & Statistics, Sisitan & Blouchestan University, Zahedan, Iran.
Lifetime Data Anal. 2011 Apr;17(2):234-55. doi: 10.1007/s10985-010-9174-9. Epub 2010 Jun 11.
Simple nonparametric estimates of the conditional distribution of a response variable given a covariate are often useful for data exploration purposes or to help with the specification or validation of a parametric or semi-parametric regression model. In this paper we propose such an estimator in the case where the response variable is interval-censored and the covariate is continuous. Our approach consists in adding weights that depend on the covariate value in the self-consistency equation proposed by Turnbull (J R Stat Soc Ser B 38:290-295, 1976), which results in an estimator that is no more difficult to implement than Turnbull's estimator itself. We show the convergence of our algorithm and that our estimator reduces to the generalized Kaplan-Meier estimator (Beran, Nonparametric regression with randomly censored survival data, 1981) when the data are either complete or right-censored. We demonstrate by simulation that the estimator, bootstrap variance estimation and bandwidth selection (by rule of thumb or cross-validation) all perform well in finite samples. We illustrate the method by applying it to a dataset from a study on the incidence of HIV in a group of female sex workers from Kinshasa.
对于给定协变量的响应变量的条件分布,简单的非参数估计通常有助于数据探索,或有助于参数或半参数回归模型的设定或验证。在本文中,我们针对响应变量为区间删失且协变量为连续变量的情况提出了这样一种估计方法。我们的方法是在Turnbull(《皇家统计学会会刊B辑》38:290 - 295,1976)提出的自一致性方程中加入依赖于协变量值的权重,这使得得到的估计方法在实施难度上并不比Turnbull估计方法本身更大。我们证明了我们算法的收敛性,并且当数据为完全数据或右删失数据时,我们的估计方法简化为广义Kaplan - Meier估计方法(Beran,《具有随机删失生存数据的非参数回归》,1981)。我们通过模拟表明,该估计方法、自助法方差估计和带宽选择(通过经验法则或交叉验证)在有限样本中都表现良好。我们将该方法应用于来自金沙萨一组女性性工作者的艾滋病毒发病率研究数据集,以此对该方法进行说明。