Bantis Leonidas E, Yan Qingxiang, Tsimikas John V, Feng Ziding
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 77030, TX, U.S.A.
Department of Mathematics, Division of Statistics and Data Analysis, University of the Aegean, Samos, 83200, Greece.
Stat Med. 2017 Oct 30;36(24):3830-3843. doi: 10.1002/sim.7394. Epub 2017 Aug 7.
Protein biomarkers found in plasma are commonly used for cancer screening and early detection. Measurements obtained by such markers are often based on different assays that may not support detection of accurate measurements due to a limit of detection. The ROC curve is the most popular statistical tool for the evaluation of a continuous biomarker. However, in situations where limits of detection exist, the empirical ROC curve fails to provide a valid estimate for the whole spectrum of the false positive rate (FPR). Hence, crucial information regarding the performance of the marker in high sensitivity and/or high specificity values is not revealed. In this paper, we address this problem and propose methods for constructing ROC curve estimates for all possible FPR values. We explore flexible parametric methods, transformations to normality, and robust kernel-based and spline-based approaches. We evaluate our methods though simulations and illustrate them in colorectal and pancreatic cancer data.
血浆中发现的蛋白质生物标志物通常用于癌症筛查和早期检测。通过此类标志物获得的测量结果往往基于不同的检测方法,由于检测限的原因,这些方法可能无法支持准确测量值的检测。ROC曲线是评估连续生物标志物最常用的统计工具。然而,在存在检测限的情况下,经验ROC曲线无法为整个假阳性率(FPR)范围提供有效的估计。因此,关于标志物在高灵敏度和/或高特异性值下性能的关键信息并未揭示出来。在本文中,我们解决了这个问题,并提出了针对所有可能的FPR值构建ROC曲线估计的方法。我们探索了灵活的参数方法、正态变换以及基于核和样条的稳健方法。我们通过模拟评估了我们的方法,并在结直肠癌和胰腺癌数据中进行了说明。