Chen Shaoping, Tu Yi-Cheng, Xia Yuni
Department of Mathematics, Wuhan University of Technology, 122 Luosi Road, 430070 Wuhan, Hubei, People's Republic of China
VLDB J. 2011 Aug 1;20(4):471-494. doi: 10.1007/s00778-010-0205-7.
Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree.
许多科学和工程领域都会产生大量的时空数据。对此类数据的存储、检索和分析给数据库系统设计带来了巨大挑战。对科学时空数据的分析通常涉及计算所有点对点交互的函数。其中一种分析方法,即空间距离直方图(SDH),对科学发现至关重要。最近,已经提出了在大规模科学数据库中高效处理SDH的算法。这些算法采用递归树遍历策略,批量处理访问的树节点中的点对点距离,因此与必须计算所有成对距离的暴力方法相比,所需时间更少。尽管实验结果很有前景,但此类算法的复杂性尚未得到深入研究。在本文中,我们基于几何建模方法对此类算法进行了分析。主要技术是将点数分析转化为一个问题,即量化算法可以批量处理成对距离的区域面积。通过分析,我们得出结论,随着访问树的层数增加,待处理的成对距离数量呈指数下降。这导致了一个时间复杂度低于暴力算法所需二次时间的证明,并为常数时间近似算法奠定了基础。我们的模型也具有通用性,因为它适用于构建树时的广泛点空间分布、直方图类型和空间划分选项。