Li Gang, Wang Xiaoyan
Departments of Biostatistics and Biomathematics, University of California, Los Angeles, CA.
Division of General Internal Medicine and Health Services Research, University of California, Los Angeles, CA.
J Am Stat Assoc. 2019;114(528):1815-1825. doi: 10.1080/01621459.2018.1515079. Epub 2019 Mar 11.
This article develops a pair of new prediction summary measures for a nonlinear prediction function with right-censored time-to-event data. The first measure, defined as the proportion of explained variance by a linearly corrected prediction function, quantifies the potential predictive power of the nonlinear prediction function. The second measure, defined as the proportion of explained prediction error by its corrected prediction function, gauges the closeness of the prediction function to its corrected version and serves as a supplementary measure to indicate (by a value less than 1) whether the correction is needed to fulfill its potential predictive power and quantify how much prediction error reduction can be realized with the correction. The two measures together provide a complete summary of the predictive accuracy of the nonlinear prediction function. We motivate these measures by first establishing a variance decomposition and a prediction error decomposition at the population level and then deriving uncensored and censored sample versions of these decompositions. We note that for the least square prediction function under the linear model with no censoring, the first measure reduces to the classical coefficient of determination and the second measure degenerates to 1. We show that the sample measures are consistent estimators of their population counterparts and conduct extensive simulations to investigate their finite sample properties. A real data illustration is provided using the PBC data. Supplementary materials for this article are available online. An R package PAmeasures has been developed and made available via the CRAN R library. Supplementary materials for this article are available online.
本文针对具有右删失事件发生时间数据的非线性预测函数,开发了一对新的预测性总结度量。第一个度量定义为线性校正预测函数所解释的方差比例,用于量化非线性预测函数的潜在预测能力。第二个度量定义为其校正预测函数所解释的预测误差比例,用于衡量预测函数与其校正版本的接近程度,并作为一种补充度量来表明(通过小于1的值)是否需要进行校正以实现其潜在预测能力,以及量化校正可以实现多少预测误差的减少。这两个度量共同提供了非线性预测函数预测准确性的完整总结。我们首先在总体层面建立方差分解和预测误差分解,然后推导这些分解的无删失和删失样本版本,以此来推动这些度量的提出。我们注意到,对于无删失的线性模型下的最小二乘预测函数,第一个度量简化为经典的决定系数,第二个度量退化为1。我们表明,样本度量是其总体对应物的一致估计量,并进行了广泛的模拟以研究它们的有限样本性质。使用原发性胆汁性胆管炎(PBC)数据给出了一个实际数据示例。本文的补充材料可在线获取。已开发了一个R包PAmeasures,并通过CRAN R库提供。本文的补充材料可在线获取。