Department of Statistics, Colorado State University, 102 Statistics Building, Fort Collins, 80523, Colorado, USA.
Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA.
BMC Med Res Methodol. 2024 Feb 8;24(1):30. doi: 10.1186/s12874-023-02139-5.
Rapidly developing tests for emerging diseases is critical for early disease monitoring. In the early stages of an epidemic, when low prevalences are expected, high specificity tests are desired to avoid numerous false positives. Selecting a cutoff to classify positive and negative test results that has the desired operating characteristics, such as specificity, is challenging for new tests because of limited validation data with known disease status. While there is ample statistical literature on estimating quantiles of a distribution, there is limited evidence on estimating extreme quantiles from limited validation data and the resulting test characteristics in the disease testing context.
We propose using extreme value theory to select a cutoff with predetermined specificity by fitting a Pareto distribution to the upper tail of the negative controls. We compared this method to five previously proposed cutoff selection methods in a data analysis and simulation study. We analyzed COVID-19 enzyme linked immunosorbent assay antibody test results from long-term care facilities and skilled nursing staff in Colorado between May and December of 2020.
We found the extreme value approach had minimal bias when targeting a specificity of 0.995. Using the empirical quantile of the negative controls performed well when targeting a specificity of 0.95. The higher target specificity is preferred for overall test accuracy when prevalence is low, whereas the lower target specificity is preferred when prevalence is higher and resulted in less variable prevalence estimation.
While commonly used, the normal based methods showed considerable bias compared to the empirical and extreme value theory-based methods.
When determining disease testing cutoffs from small training data samples, we recommend using the extreme value based-methods when targeting a high specificity and the empirical quantile when targeting a lower specificity.
快速开发新兴疾病检测方法对于早期疾病监测至关重要。在疫情早期,预期流行率较低时,需要高特异性检测方法以避免大量假阳性。对于新的检测方法,由于具有已知疾病状态的验证数据有限,因此选择具有所需工作特性(如特异性)的分类阳性和阴性检测结果的截止值具有挑战性。虽然有大量关于估计分布分位数的统计文献,但在疾病检测背景下,从有限的验证数据和由此产生的测试特征中估计极值分位数的证据有限。
我们建议使用极值理论通过将帕累托分布拟合到阴性对照的上尾来选择具有预定特异性的截止值。我们在数据分析和模拟研究中比较了这种方法与五种以前提出的截止值选择方法。我们分析了 2020 年 5 月至 12 月科罗拉多州长期护理机构和熟练护理人员的 COVID-19 酶联免疫吸附测定抗体检测结果。
我们发现当目标特异性为 0.995 时,极值方法的偏差最小。当目标特异性为 0.95 时,使用阴性对照的经验分位数效果很好。当流行率较低时,较高的目标特异性更有利于总体测试准确性,而当流行率较高时,较低的目标特异性更有利于降低流行率估计的可变性。
虽然常用,但与基于经验和极值理论的方法相比,基于正态的方法存在相当大的偏差。
当从小的训练数据样本确定疾病检测截止值时,我们建议在目标特异性较高时使用基于极值的方法,在目标特异性较低时使用经验分位数。