Distributed Approaches to Spatial Pivot Indexing

Date

Authors

Tyler, Kevin

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Spatial indexing is critical for retrieving data efficiently from geospatial databases, and has been a long-standing research direction of GIS. In a departure from recent spatial indexing paradigms, this study leverages pivot indexing. Pivot indexing begins by selecting a small number of points from the dataset. Then, the distances from the points in this subset- the pivots- to every point in the database are stored in secondary memory. During query time, these pre-computed values are used to evaluate candidates for range or nearest neighbor search. This approach offers a substantial reduction in the number of distance computations necessary to evaluate objects in the spatial plane. While previous spatial pivot indexing research leverages graphical processing units, this study utilizes an alternative parallelization mechanism- distributed computing. The Hadoop file system is used to accelerate index creation and querying in a scalable fashion. The results are then compared to an existing distributed solution. I ultimately discovered that my implementation of the pivot index at the distributed level underperformed existing methods by a small margin; however, my implementation offered an improved kNN query performance. These results affirm the legitimacy of the pivot indexing approach, and suggest that similar approaches deserve further investigation. Ultimately, this research does not assert that pivot indexing is superior to other approaches. Rather, it is an exploration of pivot indices in a specific computing context (the Hadoop File System). Thus, the contribution to the research community is in application. The outcome of this study is to demonstrate the utility of this particular indexing paradigm in the Hadoop software environment.

Description

Keywords

Spatial indexing, GIS, Distributed computing, Information retrieval

Citation