Suppr超能文献

用于众包的可扩展变分高斯过程:激光干涉引力波天文台中的故障检测

Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO.

作者信息

Morales-Alvarez Pablo, Ruiz Pablo, Coughlin Scott, Molina Rafael, Katsaggelos Aggelos K

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1534-1551. doi: 10.1109/TPAMI.2020.3025390. Epub 2022 Feb 3.

Abstract

In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied in the laureate laser interferometer gravitational waves observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it has to deal with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian processes (GP), have proven successful in modeling this setting. However, GPs do not scale up well to large data sets, which hampers their broad adoption in real-world problems (in particular LIGO). This has led to the very recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art for this type of problems. However, the accurate uncertainty quantification provided by GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we first leverage a standard sparse GP approximation (SVGP) to develop a GP-based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. This first approach, which we refer to as scalable variational Gaussian processes for crowdsourcing (SVGPCR), brings back GP-based methods to a state-of-the-art level, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic ones when applied to the LIGO data. Its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set. Moreover, recent GP inference techniques are also adapted to crowdsourcing and evaluated experimentally.

摘要

在过去几年中,众包正在改变获取分类训练集的方式。众包不是依赖单个专家注释者,而是将标注工作分摊给大量协作者。例如,这一方式正在被应用于诺贝尔奖得主激光干涉引力波天文台(LIGO),以检测可能妨碍识别真正引力波的干扰信号。众包场景带来了新的挑战性难题,因为它必须处理来自专业程度未知的异质注释者群体的不同意见。概率方法,如高斯过程(GP),已被证明在对这种情况进行建模时是成功的。然而,高斯过程在处理大数据集时扩展性不佳,这阻碍了它们在实际问题(特别是LIGO)中的广泛应用。这导致最近引入了基于深度学习的众包方法,这些方法已成为这类问题的最新技术水平。然而,高斯过程所提供的准确不确定性量化在一定程度上被牺牲了。对于LIGO的天体物理学家来说,这是一个重要方面,因为故障检测系统应该提供其预测的非常准确的概率分布。在这项工作中,我们首先利用标准稀疏高斯过程近似(SVGP)来开发一种基于高斯过程的众包方法,该方法可分解为小批次。这使得它能够处理以前难以处理的数据集。我们将这种第一种方法称为用于众包的可扩展变分高斯过程(SVGPCR),它将基于高斯过程的方法带回了最新技术水平,并且在不确定性量化方面表现出色。当应用于LIGO数据时,SVGPCR被证明优于基于深度学习的方法和以前的概率方法。在基于MNIST数据集的控制实验中,对其行为和主要特性进行了仔细分析。此外,最近的高斯过程推理技术也被应用于众包并进行了实验评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验