Tree-Based Methods for Clustering Time Series Using Domain-Relevant Attributes
42 Pages Posted: 12 Nov 2018 Last revised: 23 Nov 2018
Date Written: November 12, 2018
We propose a set of two new methods for clustering time series that captures temporal information (trend, seasonality and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. The single-step method clusters series using trend, seasonality, time series lags and domain-relevant cross-sectional attributes, using a single linear regression model. The two-step method first clusters by trend, seasonality and domain-relevant cross-sectional attributes, and then further clusters the residuals series by autocorrelation and the domain-relevant crosssectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results of comparing our approach to forecasting each series using an Auto Regressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts that are practically on par with ARIMA models, yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time-series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability.
Keywords: time series, Clustering, Model-based Partitioning Tree, Linear Regression, Autoregressive integrated moving average (ARIMA), Forecasting
Suggested Citation: Suggested Citation