IEEE Trans Cybern. 2021 Jul;51(7):3483-3495. doi: 10.1109/TCYB.2020.2989213. Epub 2021 Jun 23.
A model can be easily influenced by unseen factors in nonstationary environments and fail to fit dynamic data distribution. In a classification scenario, this is known as a concept drift. For instance, the shopping preference of customers may change after they move from one city to another. Therefore, a shopping website or application should alter recommendations based on its poorer predictions of such user patterns. In this article, we propose a novel approach called the multiscale drift detection test (MDDT) that efficiently localizes abrupt drift points when feature values fluctuate, meaning that the current model needs immediate adaption. MDDT is based on a resampling scheme and a paired student t -test. It applies a detection procedure on two different scales. Initially, the detection is performed on a broad scale to check if recently gathered drift indicators remain stationary. If a drift is claimed, a narrow scale detection is performed to trace the refined change time. This multiscale structure reduces the massive time of constantly checking and filters noises in drift indicators. Experiments are performed to compare the proposed method with several algorithms via synthetic and real-world datasets. The results indicate that it outperforms others when abrupt shift datasets are handled, and achieves the highest recall score in localizing drift points.
模型很容易受到非平稳环境中看不见的因素的影响,无法拟合动态数据分布。在分类场景中,这被称为概念漂移。例如,客户的购物偏好可能会在他们从一个城市搬到另一个城市后发生变化。因此,购物网站或应用程序应该根据用户模式的较差预测来改变推荐。在本文中,我们提出了一种称为多尺度漂移检测测试 (MDDT) 的新方法,该方法在特征值波动时有效地定位突然的漂移点,这意味着当前模型需要立即进行自适应。MDDT 基于重采样方案和配对学生 t 检验。它在两个不同的尺度上应用检测过程。最初,在较宽的范围内进行检测,以检查最近收集的漂移指标是否保持稳定。如果声称存在漂移,则进行窄范围检测以跟踪细化的变化时间。这种多尺度结构减少了大量的时间,用于不断检查和过滤漂移指标中的噪声。通过使用合成数据集和真实数据集,将提出的方法与几种算法进行了实验比较。结果表明,在处理突然转移数据集时,它优于其他方法,并且在定位漂移点方面获得了最高的召回分数。