Cai T, Cheng S
Department of Biostatistics, Harvard University, Boston, MA 02115, USA.
Biostatistics. 2008 Apr;9(2):216-33. doi: 10.1093/biostatistics/kxm037. Epub 2007 Dec 3.
Recent advancement in technology promises to yield a multitude of tests for disease diagnosis and prognosis. When there are multiple sources of information available, it is often of interest to construct a composite score that can provide better classification accuracy than any individual measurement. In this paper, we consider robust procedures for optimally combining tests when test results are measured prior to disease onset and disease status evolves over time. To account for censoring of disease onset time, the most commonly used approach to combining tests to detect subsequent disease status is to fit a proportional hazards model (Cox, 1972) and use the estimated risk score. However, simulation studies suggested that such a risk score may have poor accuracy when the proportional hazards assumption fails. We propose the use of a nonparametric transformation model (Han, 1987) as a working model to derive an optimal composite score with theoretical justification. We demonstrate that the proposed score is the optimal score when the model holds and is optimal "on average" among linear scores even if the model fails. Time-dependent sensitivity, specificity, and receiver operating characteristic curve functions are used to quantify the accuracy of the resulting composite score. We provide consistent and asymptotically Gaussian estimators of these accuracy measures. A simple model-free resampling procedure is proposed to obtain all consistent variance estimators. We illustrate the new proposals with simulation studies and an analysis of a breast cancer gene expression data set.
技术上的最新进展有望带来大量用于疾病诊断和预后的检测方法。当有多个信息来源可用时,构建一个能比任何单个测量指标提供更高分类准确性的综合评分通常是很有意义的。在本文中,我们考虑在疾病发作前测量检测结果且疾病状态随时间演变的情况下,用于最优组合检测的稳健方法。为了考虑疾病发作时间的删失,组合检测以检测后续疾病状态最常用的方法是拟合比例风险模型(Cox,1972)并使用估计的风险评分。然而,模拟研究表明,当比例风险假设不成立时,这样的风险评分可能准确性较差。我们建议使用非参数变换模型(Han,1987)作为工作模型,以理论依据推导出最优综合评分。我们证明,当模型成立时,所提出的评分是最优评分,即使模型不成立,在所有线性评分中它“平均”也是最优的。使用随时间变化的灵敏度、特异度和受试者工作特征曲线函数来量化所得综合评分的准确性。我们提供了这些准确性度量的一致且渐近高斯估计量。提出了一种简单的无模型重采样程序来获得所有一致的方差估计量。我们通过模拟研究和对乳腺癌基因表达数据集的分析来说明这些新建议。