Ascent EM for Efficient Curve-Clustering in Large Online Auction Databases
25 Pages Posted: 18 May 2006
Date Written: November 30, 2004
In this paper we propose a sampling-based implementation of the EM algorithm for modelbased clustering. By sampling-based we mean that the algorithm uses only a small sample from the entire database in every iteration. Using only a small sample allows for significant computational improvements. In contrast to previous sampling-based versions, we suggest to select the sample randomly since a random selection allows for statistical evaluation of the algorithm's progress. By appealing to EM's famous likelihood ascent property, the algorithm chooses samples as small as possible, thus ensuring computational efficiency, at the same time the samples are large enough to advance the progress of the method. The algorithm is stochastic in nature and has the potential of overcoming local traps and suboptimal solutions. We apply the algorithm to the problem of clustering infinite-dimensional curves and illustrate it on a large database of online auctions.
Keywords: stochastic optimization, monte carlo, em algorithm, clustering, functional data, electronic commerce, online auction, eBay
Suggested Citation: Suggested Citation