Mean-Variance Analysis of the Performance of Spatial Clustering Methods
Geographical Information Science, 1998, Vol. 12, No. 3, 269-289
University of Alberta School of Business Research Paper No. 2013-1086
Posted: 2 Jul 2013
Date Written: June 1, 1997
Abstract
Geographical Information Systems (GIS) involve the manipulation of large spatial data sets, and the performance of these systems is often determined by how these data sets are organized on secondary storage (disk). This paper describes a simulation study investigating the performance of two non-recursive spatial clustering methods the Inverted Naive and the Spiral methods in extensive detail and comparing them with the Hilbert fractal method that has been shown in previous studies to outperform other recursive clustering methods. The paper highlights the importance of analysing the sample variance when evaluating the relative performance of various spatial ordering methods. The clustering performance of the methods is examined in terms of both the mean and variance values of the number of clusters (runs of consecutive disk blocks) that must be accessed to retrieve a query region of a given size and orientation. The results show that, for a blocking factor of 1, the mean values for the Spiral method are the best, and on average, about 30% better than for the other two methods. In terms of variance, the inverted naive method is the best followed by the Spiral and Hilbert methods, in that order. We also study the impact of varying query size and the skew ratio (between the X and Y dimensions) for each method. While these performance results do not generalize for higher blocking factors, we believe that they are useful for both researchers and practitioners to know because several previous studies have also examined this special case, and also because it has important implications for the performance of GIS applications.
Suggested Citation: Suggested Citation