Robotics & Big Data Labs, University of Haifa, 3498838 Haifa, Israel.
Sensors (Basel). 2020 May 27;20(11):3042. doi: 10.3390/s20113042.
A coreset of a dataset is a small weighted set, such that querying the coreset provably yields a ( 1 + ε )-factor approximation to the original (full) dataset, for a given family of queries. This paper suggests accurate coresets ( ε = 0 ) that are subsets of the input for fundamental optimization problems. These coresets enabled us to implement a "Guardian Angel" system that computes pose-estimation in a rate > 20 frames per second. It tracks a toy quadcopter which guides guests in a supermarket, hospital, mall, airport, and so on. We prove that any set of matrices in R d × d whose sum is a matrix of rank , has a coreset whose sum has the same left and right singular vectors as , and consists of O ( d r ) = O ( d 2 ) matrices, independent of . This implies the first (exact, weighted subset) coreset of O ( d 2 ) points to problems such as linear regression, PCA/SVD, and Wahba's problem, with corresponding streaming, dynamic, and distributed versions. Our main tool is a novel usage of the Caratheodory Theorem for coresets, an algorithm that computes its set in time that is linear in its cardinality. Extensive experimental results on both synthetic and real data, companion video of our system, and open code are provided.
数据集的约简是一个小的加权集合,对于给定的查询族,查询约简可证明对原始(完整)数据集提供(1+ε)倍的近似。本文提出了针对基本优化问题的准确约简(ε=0),它们是输入的子集。这些约简使我们能够实现一个“守护天使”系统,该系统以 > 20 帧/秒的速度计算姿态估计。它跟踪一架玩具四旋翼飞行器,为超市、医院、商场、机场等场所的客人提供引导。我们证明了任何一组在 R d×d 中的矩阵,其和为秩 的矩阵 ,都有一个约简,其和具有与 相同的左右奇异向量,并且由 O(dr)=O(d 2)个矩阵组成,与 无关。这意味着第一个(精确的、加权子集)约简为 O(d 2)个点的问题,例如线性回归、PCA/SVD 和 Wahba 问题,以及相应的流处理、动态和分布式版本。我们的主要工具是使用 Caratheodory 定理进行约简的新方法,该算法以其基数的线性时间计算其集合。提供了大量关于合成和真实数据的实验结果、我们系统的配套视频以及开放代码。