Suppr超能文献

数据中的几何异常检测。

Geometric anomaly detection in data.

机构信息

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom.

The Alan Turing Institute, British Library, London NW1 2DB, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2020 Aug 18;117(33):19664-19669. doi: 10.1073/pnas.2001741117. Epub 2020 Aug 3.

Abstract

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.

摘要

追求逼近高维数据的低维模型在物理、自然和社会科学中无处不在。大多数标准建模技术所基于的主导范例假设数据集中在一个相对较小内在维度的未知流形附近。在这里,我们提出了一个系统的框架,用于检测数据中的界面和相关异常,这些数据可能不符合流形假设。通过计算每个数据点周围小区域的局部拓扑结构,我们能够将给定的数据集划分为不相交的类,每个类都可以通过单个流形来单独近似。由于这些流形可能具有不同的内在维度,因此即使没有任何点从奇点处精确采样,局部拓扑也能发现数据中的奇异区域。我们通过识别环己烷构象的 24 维空间中的两个曲面的交点,并定位沉浸在 3 维空间中的 Henneberg 最小曲面的所有自交点,展示了这种方法。由于拓扑计算的局部性质,执行这种数据分层的算法负担很容易分布在多个处理器上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c04/7443892/f70a456761c5/pnas.2001741117fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验