SSRN Home Search and Download Papers Browse Abstract and Paper Submission Subscribe to Networks View Briefcase Top Papers Top Authors Top Institutions

 

Abstract

 


 



Clustering Over Time and Data Set Comparison

Daniel M. Fleder
University of Pennsylvania - The Wharton School

Balaji Padmanabhan
University of South Florida - College of Business Administration


August, 30 2009


Abstract:     
Cluster analysis is useful for data interpretation. Instead of studying thousands of records, one can create a smaller number of clusters and interpret a prototype for each. Often, however, the world being interpreted via clusters can change. The naive approach of independently reclustering the data each period has a significant drawback: even if the data's distribution is unchanged, sampling variation can cause cluster prototypes to differ from one period to the next, which creates difficulty in comparing cluster solutions. In this paper we present a method for clustering sequential data sets and comparing cluster solutions over time. At a macro level, we examine how cluster prototypes change over time; at a micro level, we examine how objects transition among these prototypes. The method works as follows. We take as given cluster prototypes from the first data set. In clustering the new data, the previous prototypes are constrained to remain unchanged; this enables consistency among old and new prototypes. However, to fit the new data well, the second clustering must be flexible enough to add new prototypes where needed. This amounts to an optimization criteria that trades off consistency (reuse of old prototypes) with model fit (cluster fit on the new data). We formulate this as a constrained optimization problem and present a solution technique. A feature of the technique is its ability to incorporate prior knowledge from the first period to define an appropriate consistency-fit tradeoff. We envision the method will have particular relevance for business, as firms increasingly manage their customers through segments for which new data arrives over time.

Keywords: clustering, cluster analysis, resampling, penalty methods

Working Paper Series

Date posted: September 01, 2009 ; Last revised: September 01, 2009

Suggested Citation

Fleder, Daniel M. and Padmanabhan, Balaji, Clustering Over Time and Data Set Comparison (August, 30 2009). Available at SSRN: http://ssrn.com/abstract=1464537


Export to: Export Citation What's this?

Contact Information

Daniel M. Fleder (Contact Author)
University of Pennsylvania - The Wharton School ( email )
Philadelphia, PA 19104
United States
Balaji Padmanabhan
University of South Florida - College of Business Administration ( email )
4202 E. Fowler Avenue, BSN 3403
Tampa, FL 33620-5500
United States
Feedback to SSRN (Beta)


Paper statistics
Abstract Views: 41
Downloads: 0

© 2010 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright
This page was served by apollo1 in 0.125 seconds.