基于精确轨迹查询轨迹：一种基于布隆过滤器的方法。

Query the trajectory based on the precise track: a Bloom filter-based approach.

作者信息

Wang Zengjie, Luo Wen, Yuan Linwang, Gao Hong, Wu Fan, Hu Xu, Yu Zhaoyuan

机构信息

Ministry of Education, Key Laboratory of Virtual Geographic Environment (Nanjing Normal University), Nanjing, China.

State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing, China.

出版信息

Geoinformatica. 2021;25(2):397-416. doi: 10.1007/s10707-021-00433-2. Epub 2021 Mar 15.

DOI:10.1007/s10707-021-00433-2

PMID:33746566

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7957281/

Abstract

Fast and precise querying in a given set of trajectory points is an important issue of trajectory query. Typically, there are massive trajectory data in the database, yet the query sets only have a few points, which is a challenge for the superior performance of trajectory querying. The current trajectory query methods commonly use the tree-based index structure and the signature-based method to classify, simplify, and filter the trajectory to improve the performance. However, the unstructured essence and the spatiotemporal heterogeneity of the trajectory-sequence lead these methods to a high degree of spatial overlap, frequent I/O, and high memory occupation. Thus, they are not suitable for the time-critical tasks of trajectory big data. In this paper, a query method of trajectory is developed on the Bloom Filter. Based on the gridded space and geocoding, the spatial trajectory sequences (tracks) query is transformed into the query of the text string. The geospace was regularly divided by the geographic grid, and each cell was assigned an independent geocode, converting the high-dimensional irregular space trajectory query into a one-dimensional string query. The point in each cell is regarded as a signature, which forms a mapping to the bit-array of the Bloom Filter. This conversion effectively eliminates the high degree of overlap and instability of query performance. Meanwhile, the independent coding ensures the uniqueness of the whole tracks. In this method, there is no need for additional I/O on the raw trajectory data when the track is queried. Compared to the original data, the memory occupied by this method is negligible. Based on Beijing Taxi and Shenzhen bus trajectory data, an experiment using this method was constructed, and random queries under a variety of conditions boundaries were constructed. The results verified that the performance and stability of our method, compared to R*tree index, have been improved by 2000 to 4000 times, based on one million to tens of millions of trajectory data. And the Bloom Filter-based query method is hardly affected by grid size, original data size, and length of tracks. With such a time advantage, our method is suitable for time-critical spatial computation tasks, such as anti-terrorism, public safety, epidemic prevention, and control, etc.

摘要

在给定的轨迹点集中进行快速精确查询是轨迹查询的一个重要问题。通常，数据库中存在大量轨迹数据，但查询集只有几个点，这对轨迹查询的卓越性能构成了挑战。当前的轨迹查询方法通常使用基于树的索引结构和基于签名的方法来对轨迹进行分类、简化和过滤，以提高性能。然而，轨迹序列的非结构化本质和时空异质性导致这些方法存在高度的空间重叠、频繁的I/O操作和高内存占用。因此，它们不适用于对时间要求严格的轨迹大数据任务。本文基于布隆过滤器开发了一种轨迹查询方法。基于网格化空间和地理编码，将空间轨迹序列（轨迹）查询转换为文本字符串查询。地理空间通过地理网格进行规则划分，每个单元格被赋予一个独立的地理编码，将高维不规则空间轨迹查询转换为一维字符串查询。每个单元格中的点被视为一个签名，它与布隆过滤器的位阵列形成映射。这种转换有效地消除了查询性能的高度重叠和不稳定性。同时，独立编码确保了整个轨迹的唯一性。在这种方法中，查询轨迹时无需对原始轨迹数据进行额外的I/O操作。与原始数据相比，该方法占用的内存可以忽略不计。基于北京出租车和深圳公交车轨迹数据构建了使用该方法的实验，并构建了各种条件边界下的随机查询。结果验证了基于数百万到数千万轨迹数据，与R*树索引相比，我们的方法在性能和稳定性上提高了2000到4000倍。并且基于布隆过滤器的查询方法几乎不受网格大小、原始数据大小和轨迹长度的影响。凭借如此的时间优势，我们的方法适用于对时间要求严格的空间计算任务，如反恐、公共安全、疫情防控等。