Hougaard P
Novo Nordisk, Bagsvaerd, Denmark.
Biometrics. 1999 Mar;55(1):13-22. doi: 10.1111/j.0006-341x.1999.00013.x.
Survival data stand out as a special statistical field. This paper tries to describe what survival data is and what makes it so special. Survival data concern times to some events. A key point is the successive observation of time, which on the one hand leads to some times not being observed so that all that is known is that they exceed some given times (censoring), and on the other hand implies that predictions regarding the future course should be conditional on the present status (truncation). In the simplest case, this condition is that the individual is alive. The successive conditioning makes the hazard function, which describes the probability of an event happening during a short interval given that the individual is alive today (or more generally able to experience the event), the most relevant concept. Standard distributions available (normal, log-normal, gamma, inverse Gaussian, and so forth) can account for censoring and truncation, but this is cumbersome. Besides, they fit badly because they are either symmetric or right skewed, but survival time distributions can easily be left-skewed positive variables. A few distributions satisfying these requirements are available, but often nonparametric methods are preferable as they account better conceptually for truncation and censoring and give a better fit. Finally, we compare the proportional hazards regression models with accelerated failure time models.
生存数据是一个特殊的统计领域。本文试图描述什么是生存数据以及它为何如此特殊。生存数据涉及到某些事件发生的时间。一个关键点是对时间的连续观测,这一方面导致有些时间未被观测到,以至于我们所知道的只是它们超过了某个给定时间(删失),另一方面意味着关于未来进程的预测应该以当前状态为条件(截断)。在最简单的情况下,这个条件就是个体存活。连续的条件设定使得风险函数成为最相关的概念,风险函数描述了在个体今天存活(或者更一般地说能够经历该事件)的情况下,在一个短时间间隔内事件发生的概率。现有的标准分布(正态分布、对数正态分布、伽马分布、逆高斯分布等等)可以考虑删失和截断,但这很麻烦。此外,它们拟合效果很差,因为它们要么是对称的,要么是右偏的,而生存时间分布很容易是左偏的正变量。有一些满足这些要求的分布,但通常非参数方法更可取,因为它们在概念上能更好地考虑截断和删失,并且拟合效果更好。最后,我们比较了比例风险回归模型和加速失效时间模型。