Buyse M, Molenberghs G
International Institute for Drug Development, Brussels, Belgium.
Biometrics. 1998 Sep;54(3):1014-29.
The validation of surrogate endpoints has been studied by Prentice (1989, Statistics in Medicine 8, 431-440) and Freedman, Graubard, and Schatzkin (1992, Statistics in Medicine 11, 167-178). We extended their proposals in the cases where the surrogate and the final endpoints are both binary or normally distributed. Letting T and S be random variables that denote the true and surrogate endpoint, respectively, and Z be an indicator variable for treatment, Prentice's criteria are fulfilled if Z has a significant effect on T and on S, if S has a significant effect on T, and if Z has no effect on T given S. Freedman relaxed the latter criterion by estimating PE, the proportion of the effect of Z on T that is explained by S, and by requiring that the lower confidence limit of PE be larger than some proportion, say 0.5 or 0.75. This condition can only be verified if the treatment has a massively significant effect on the true endpoint, a rare situation. We argue that two other quantities must be considered in the validation of a surrogate endpoint: RE, the effect of Z on T relative to that of Z on S, and gamma Z, the association between S and T after adjustment for Z. A surrogate is said to be perfect at the individual level when there is a perfect association between the surrogate and the final endpoint after adjustment for treatment. A surrogate is said to be perfect at the population level if RE is 1. A perfect surrogate fulfills both conditions, in which case S and T are identical up to a deterministic transformation. Fieller's theorem is used for the estimation of PE, RE, and their respective confidence intervals. Logistic regression models and the global odds ratio model studied by Dale (1986, Biometrics, 42, 909-917) are used for binary endpoints. Linear models are employed for continuous endpoints. In order to be of practical value, the validation of surrogate endpoints is shown to require large numbers of observations.
替代终点的验证已由普伦蒂斯(1989年,《医学统计学》8卷,431 - 440页)以及弗里德曼、格劳巴德和沙茨金(1992年,《医学统计学》11卷,167 - 178页)进行了研究。我们在替代终点和最终终点均为二项分布或正态分布的情况下扩展了他们的提议。分别用随机变量(T)和(S)表示真实终点和替代终点,用(Z)作为治疗的指示变量,如果(Z)对(T)和(S)有显著影响,(S)对(T)有显著影响,并且在给定(S)的情况下(Z)对(T)没有影响,那么普伦蒂斯的标准就得到了满足。弗里德曼通过估计(PE)(即(Z)对(T)的影响中由(S)解释的比例)并要求(PE)的置信下限大于某个比例(比如0.5或0.75)来放宽了后一个标准。只有当治疗对真实终点有极大显著影响时,这种情况才罕见,才能验证这个条件。我们认为在替代终点的验证中还必须考虑另外两个量:(RE),即(Z)对(T)的影响相对于(Z)对(S)的影响;以及(\gamma Z),即对(Z)进行调整后(S)与(T)之间的关联。如果在对治疗进行调整后替代终点与最终终点之间存在完美关联,则称替代终点在个体水平上是完美的。如果(RE)为1,则称替代终点在总体水平上是完美的。一个完美的替代终点满足这两个条件,在这种情况下,(S)和(T)在经过确定性变换后是相同的。菲勒定理用于估计(PE)、(RE)及其各自的置信区间。对于二项终点使用戴尔(1986年,《生物统计学》42卷,909 - 917页)研究的逻辑回归模型和全局优势比模型。对于连续终点使用线性模型。为了具有实际价值,替代终点的验证表明需要大量的观察值。