Suppr超能文献

NFAD:使用归一化流修复异常检测

NFAD: fixing anomaly detection using normalizing flows.

作者信息

Ryzhikov Artem, Borisyak Maxim, Ustyuzhanin Andrey, Derkach Denis

机构信息

Laboratory of Methods for Big Data Analysis, HSE University, Moscow, Russia.

出版信息

PeerJ Comput Sci. 2021 Nov 18;7:e757. doi: 10.7717/peerj-cs.757. eCollection 2021.

Abstract

Anomaly detection is a challenging task that frequently arises in practically all areas of industry and science, from fraud detection and data quality monitoring to finding rare cases of diseases and searching for new physics. Most of the conventional approaches to anomaly detection, such as one-class SVM and Robust Auto-Encoder, are one-class classification methods, , focus on separating normal data from the rest of the space. Such methods are based on the assumption of separability of normal and anomalous classes, and subsequently do not take into account any available samples of anomalies. Nonetheless, in practical settings, some anomalous samples are often available; however, usually in amounts far lower than required for a balanced classification task, and the separability assumption might not always hold. This leads to an important task-incorporating known anomalous samples into training procedures of anomaly detection models. In this work, we propose a novel model-agnostic training procedure to address this task. We reformulate one-class classification as a binary classification problem with normal data being distinguished from pseudo-anomalous samples. The pseudo-anomalous samples are drawn from low-density regions of a normalizing flow model by feeding tails of the latent distribution into the model. Such an approach allows to easily include known anomalies into the training process of an arbitrary classifier. We demonstrate that our approach shows comparable performance on one-class problems, and, most importantly, achieves comparable or superior results on tasks with variable amounts of known anomalies.

摘要

异常检测是一项具有挑战性的任务,几乎在工业和科学的所有领域都会经常出现,从欺诈检测、数据质量监测到发现罕见疾病病例以及寻找新物理现象。大多数传统的异常检测方法,如一分类支持向量机和鲁棒自编码器,都是一分类方法,专注于将正常数据与空间中的其他数据区分开来。此类方法基于正常类和异常类可分离的假设,因此没有考虑任何可用的异常样本。然而,在实际场景中,一些异常样本通常是可用的;不过,其数量通常远低于平衡分类任务所需的数量,而且可分离性假设可能并不总是成立。这就引出了一项重要任务——将已知的异常样本纳入异常检测模型的训练过程。在这项工作中,我们提出了一种新颖的与模型无关的训练过程来解决此任务。我们将一分类重新表述为一个二分类问题,将正常数据与伪异常样本区分开来。伪异常样本是通过将潜在分布的尾部输入到归一化流模型的低密度区域中抽取出来的。这种方法能够轻松地将已知异常纳入任意分类器的训练过程。我们证明,我们的方法在一分类问题上表现出可比的性能,而且最重要的是,在具有不同数量已知异常的任务上取得了可比或更优的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50b1/8627226/7181927114b8/peerj-cs-07-757-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验