Evans Ciaran, G'Sell Max
Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, United States of America.
Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA, United States of America.
PLoS One. 2024 Sep 16;19(9):e0310194. doi: 10.1371/journal.pone.0310194. eCollection 2024.
Classifiers have been developed to help diagnose dengue fever in patients presenting with febrile symptoms. However, classifier predictions often rely on the assumption that new observations come from the same distribution as training data. If the population prevalence of dengue changes, as would happen with a dengue outbreak, it is important to raise an alarm as soon as possible, so that appropriate public health measures can be taken and also so that the classifier can be re-calibrated. In this paper, we consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulated outbreaks with real dengue data, we show that our method outperforms other detection procedures in this label shift setting.
已经开发出分类器来帮助诊断出现发热症状的患者是否感染登革热。然而,分类器的预测通常依赖于一个假设,即新的观察结果来自与训练数据相同的分布。如果登革热的人群患病率发生变化,就像登革热爆发时那样,尽快发出警报很重要,这样才能采取适当的公共卫生措施,并且还能对分类器进行重新校准。在本文中,我们考虑在顺序观察的未标记分类数据中检测这种分布变化的问题。我们关注分布中的标签偏移变化,即类先验发生偏移但类条件分布保持不变的情况。我们将这个问题简化为检测一维分类器分数变化的问题,从而得到简单的非参数顺序变点检测程序。我们的程序利用分类器训练数据来估计检测统计量,并在训练数据规模上收敛到其参数对应物。在使用真实登革热数据的模拟爆发中,我们表明我们的方法在这种标签偏移设置下优于其他检测程序。