Ascent EM for Efficient Curve-Clustering in Large Online Auction Databases

25 Pages Posted: 18 May 2006

See all articles by Wolfgang Jank

Wolfgang Jank

University of Maryland - Decision and Information Technologies Department

Date Written: November 30, 2004

Abstract

In this paper we propose a sampling-based implementation of the EM algorithm for modelbased clustering. By sampling-based we mean that the algorithm uses only a small sample from the entire database in every iteration. Using only a small sample allows for significant computational improvements. In contrast to previous sampling-based versions, we suggest to select the sample randomly since a random selection allows for statistical evaluation of the algorithm's progress. By appealing to EM's famous likelihood ascent property, the algorithm chooses samples as small as possible, thus ensuring computational efficiency, at the same time the samples are large enough to advance the progress of the method. The algorithm is stochastic in nature and has the potential of overcoming local traps and suboptimal solutions. We apply the algorithm to the problem of clustering infinite-dimensional curves and illustrate it on a large database of online auctions.

Keywords: stochastic optimization, monte carlo, em algorithm, clustering, functional data, electronic commerce, online auction, eBay

Suggested Citation

Jank, Wolfgang, Ascent EM for Efficient Curve-Clustering in Large Online Auction Databases (November 30, 2004). Robert H. Smith School Research Paper No. RHS-06-008. Available at SSRN: https://ssrn.com/abstract=902908 or http://dx.doi.org/10.2139/ssrn.902908

Wolfgang Jank (Contact Author)

University of Maryland - Decision and Information Technologies Department ( email )

Robert H. Smith School of Business
4300 Van Munching Hall
College Park, MD 20742
United States
301-405-1118 (Phone)

HOME PAGE: http://www.smith.umd.edu/faculty/wjank/

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
78
Abstract Views
1,101
rank
324,319
PlumX Metrics