验证性偏倚下ROC分析的最近邻估计

Nearest-Neighbor Estimation for ROC Analysis under Verification Bias.

作者信息

Adimari Gianfranco, Chiogna Monica

出版信息

Int J Biostat. 2015 May;11(1):109-24. doi: 10.1515/ijb-2014-0014.

DOI:10.1515/ijb-2014-0014

Abstract

For a continuous-scale diagnostic test, the receiver operating characteristic (ROC) curve is a popular tool for displaying the ability of the test to discriminate between healthy and diseased subjects. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the test result and other characteristics of the subjects. Estimators of the ROC curve based only on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias, in particular under the assumption that the true disease status, if missing, is missing at random (MAR). MAR assumption means that the probability of missingness depends on the true disease status only through the test result and observed covariate information. However, the existing methods require parametric models for the (conditional) probability of disease and/or the (conditional) probability of verification, and hence are subject to model misspecification: a wrong specification of such parametric models can affect the behavior of the estimators, which can be inconsistent. To avoid misspecification problems, in this paper we propose a fully nonparametric method for the estimation of the ROC curve of a continuous test under verification bias. The method is based on nearest-neighbor imputation and adopts generic smooth regression models for both the probability that a subject is diseased and the probability that it is verified. Simulation experiments and an illustrative example show the usefulness of the new method. Variance estimation is also discussed.

摘要

对于连续尺度的诊断测试，接收者操作特征（ROC）曲线是一种常用工具，用于展示该测试区分健康受试者和患病受试者的能力。在一些研究中，仅对部分受试者进行真实疾病状态的验证，这可能取决于测试结果和受试者的其他特征。仅基于这部分受试者子集的ROC曲线估计量通常存在偏差，这被称为验证偏差。已经提出了一些方法来校正验证偏差，特别是在真实疾病状态（如果缺失）是随机缺失（MAR）的假设下。MAR假设意味着缺失的概率仅通过测试结果和观察到的协变量信息依赖于真实疾病状态。然而，现有方法需要疾病（条件）概率和/或验证（条件）概率的参数模型，因此容易出现模型误设：这种参数模型的错误设定会影响估计量的行为，可能导致不一致。为了避免误设问题，在本文中，我们提出了一种完全非参数的方法，用于在存在验证偏差的情况下估计连续测试的ROC曲线。该方法基于最近邻插补，并对受试者患病的概率和被验证的概率采用通用的平滑回归模型。模拟实验和一个示例展示了新方法的实用性。同时也讨论了方差估计。