Valko Michal, Kveton Branislav, Valizadegan Hamed, Cooper Gregory F, Hauskrecht Milos
INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, Villeneuve d'Ascq, France,
Technicolor Labs, Palo Alto, California, USA,
Proc IEEE Int Conf Data Min. 2011;2011:735-743. doi: 10.1109/ICDM.2011.40.
In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions.
在本文中,我们考虑条件异常检测问题,其目的是识别具有异常响应或类别标签的数据实例。我们基于软调和解开发了一种用于条件异常检测的新非参数方法,利用该方法我们估计标签的置信度以检测异常的错误标注。我们进一步对解进行正则化,以避免检测孤立示例和分布支持边界上的示例。与几种基线方法相比,我们在几个合成数据集和UCI机器学习数据集上证明了所提出方法在检测异常标签方面的有效性。我们还在一个真实世界的电子健康记录数据集上评估了我们方法的性能,在该数据集中我们试图识别异常的患者管理决策。