Sorribas Albert, March Jaume, Trujillano Javier
Departament de Ciències Mèdiques Bàsiques, Universitat de Lleida, Av. Rovira Roure 44, 25198-Lleida, Spain.
Stat Med. 2002 May 15;21(9):1213-35. doi: 10.1002/sim.1086.
Receiver operating characteristic (ROC) curves provides a method for evaluating the performance of a diagnostic test. These curves represent the true positive ratio, that is, the true positives among those affected by the disease, as a function of the false positive ratio, that is, the false positives among the healthy, corresponding to each possible value of the diagnostic variable. When the diagnostic variable is continuous, the corresponding ROC curve is also continuous. However, estimation of such curve through the analysis of sample data yields a step-line, unless some assumption is made on the underlying distribution of the considered variable. Since the actual distribution of the diagnostic test is seldom known, it is difficult to select an appropriate distribution for practical use. Data transformation may offer a solution but also may introduce a distortion on the evaluation of the diagnostic test. In this paper we show that the distribution family known as the S-distribution can be used to solve this problem. The S-distribution is defined as a differential equation in which the dependent variable is the cumulative. This special form provides a highly flexible family of distributions that can be used as models for unknown distributions. It has been shown that classical statistical distributions can be represented accurately as S-distributions and that they occur in a definite subspace of the parameter space corresponding to the whole S-distribution family. Consequently, many other distributional forms that do not correspond to known distributions are provided by the S-distribution. This property can be used to model observed data for unknown distributions and is very useful in constructing parametric ROC curves in those cases. After fitting an S-distribution to the observed samples of diseased and healthy populations, ROC curve computation is straightforward. A ROC curve can be considered as the solution of a differential equation in which the dependent variable is the ratio of true positives and the independent variable is the ratio of false positives. This equation can be easily obtained from the S-distributions fitted to observed data. Using these results, we can compute pointwise confidence bands for the ROC curve and the corresponding area under the curve. We shall compare this approach with the empirical and the binormal methods for estimating a ROC curve to show that the S-distribution based method is a useful parametric procedure.
受试者工作特征(ROC)曲线提供了一种评估诊断测试性能的方法。这些曲线表示真阳性率,即患病者中的真阳性,它是假阳性率的函数,假阳性率是指健康者中的假阳性,对应于诊断变量的每个可能值。当诊断变量是连续的时,相应的ROC曲线也是连续的。然而,通过样本数据分析估计这样的曲线会产生一条阶梯线,除非对所考虑变量的潜在分布做出一些假设。由于诊断测试的实际分布很少为人所知,因此很难选择合适的分布以供实际使用。数据变换可能提供一种解决方案,但也可能在诊断测试评估中引入失真。在本文中,我们表明被称为S分布的分布族可用于解决此问题。S分布被定义为一个微分方程,其中因变量是累积量。这种特殊形式提供了一个高度灵活的分布族,可用于作为未知分布的模型。已经表明,经典统计分布可以准确地表示为S分布,并且它们出现在对应于整个S分布族的参数空间的一个确定子空间中。因此,S分布提供了许多其他与已知分布不对应的分布形式。此属性可用于对未知分布的观测数据进行建模,并且在那些情况下构建参数化ROC曲线时非常有用。将S分布拟合到患病和健康人群的观测样本后,ROC曲线计算很简单。ROC曲线可以被视为一个微分方程的解,其中因变量是真阳性率,自变量是假阳性率。这个方程可以很容易地从拟合到观测数据的S分布中得到。利用这些结果,我们可以计算ROC曲线及其相应曲线下面积的逐点置信带。我们将把这种方法与估计ROC曲线的经验方法和双正态方法进行比较,以表明基于S分布的方法是一种有用的参数化程序。