Department of Computational Sciences, Wigner Research Centre for Physics, Budapest, 1121, Hungary.
János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Ullői road 26, Budapest, 1085, Hungary.
Sci Rep. 2022 Jan 7;12(1):227. doi: 10.1038/s41598-021-03526-y.
Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In this paper, we introduce a new anomaly concept called "unicorn" or unique event and present a new, model-free, unsupervised detection algorithm to detect unicorns. The key component of the new algorithm is the Temporal Outlier Factor (TOF) to measure the uniqueness of events in continuous data sets from dynamic systems. The concept of unique events differs significantly from traditional outliers in many aspects: while repetitive outliers are no longer unique events, a unique event is not necessarily an outlier; it does not necessarily fall out from the distribution of normal activity. The performance of our algorithm was examined in recognizing unique events on different types of simulated data sets with anomalies and it was compared with the Local Outlier Factor (LOF) and discord discovery algorithms. TOF had superior performance compared to LOF and discord detection algorithms even in recognizing traditional outliers and it also detected unique events that those did not. The benefits of the unicorn concept and the new detection method were illustrated by example data sets from very different scientific fields. Our algorithm successfully retrieved unique events in those cases where they were already known such as the gravitational waves of a binary black hole merger on LIGO detector data and the signs of respiratory failure on ECG data series. Furthermore, unique events were found on the LIBOR data set of the last 30 years.
异常事件的识别是许多科学和工业领域具有挑战性但至关重要的任务,特别是当异常的性质未知时。在本文中,我们引入了一种新的异常概念,称为“独角兽”或独特事件,并提出了一种新的、无模型、无监督的检测算法来检测独角兽。新算法的关键组件是时间异常因子(TOF),用于测量动态系统连续数据集中小数点事件的独特性。独特事件的概念在许多方面与传统异常值有很大的不同:虽然重复异常值不再是独特事件,但独特事件不一定是异常值;它不一定偏离正常活动的分布。我们的算法在识别不同类型模拟数据集上的异常独特事件方面的性能进行了检查,并与局部异常因子(LOF)和不一致性发现算法进行了比较。TOF 的性能优于 LOF 和不一致性检测算法,即使在识别传统异常值方面也是如此,它还检测到了那些算法未检测到的独特事件。独角兽概念和新检测方法的优势通过来自非常不同科学领域的示例数据集得到了说明。我们的算法成功地在已经知道的情况下检索到了独特事件,例如 LIGO 探测器数据中的双黑洞合并引力波和 ECG 数据系列中的呼吸衰竭迹象。此外,还在 LIBOR 数据集上发现了过去 30 年的独特事件。