Tree-Based Methods for Clustering Time Series Using Domain-Relevant Attributes

42 Pages Posted: 12 Nov 2018 Last revised: 23 Nov 2018

See all articles by Mahsa Ashouri

Mahsa Ashouri

Institute of Statistical Science, Academia Sinica

Galit Shmueli

Institute of Service Science, National Tsing Hua University, Taiwan

Chor-yiu (CY) Sin

National Tsing Hua University

Date Written: November 12, 2018

Abstract

We propose a set of two new methods for clustering time series that captures temporal information (trend, seasonality and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. The single-step method clusters series using trend, seasonality, time series lags and domain-relevant cross-sectional attributes, using a single linear regression model. The two-step method first clusters by trend, seasonality and domain-relevant cross-sectional attributes, and then further clusters the residuals series by autocorrelation and the domain-relevant crosssectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results of comparing our approach to forecasting each series using an Auto Regressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts that are practically on par with ARIMA models, yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time-series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability.

Keywords: time series, Clustering, Model-based Partitioning Tree, Linear Regression, Autoregressive integrated moving average (ARIMA), Forecasting

Suggested Citation

Ashouri, Mahsa and Shmueli, Galit and Sin, Chor-yiu (CY), Tree-Based Methods for Clustering Time Series Using Domain-Relevant Attributes (November 12, 2018). Available at SSRN: https://ssrn.com/abstract=3282849 or http://dx.doi.org/10.2139/ssrn.3282849

Mahsa Ashouri (Contact Author)

Institute of Statistical Science, Academia Sinica ( email )

No. 128, Section 2, Academia Rd, Nangang District
Taipei city, 11529
Taiwan

Galit Shmueli

Institute of Service Science, National Tsing Hua University, Taiwan ( email )

Hsinchu, 30013
Taiwan

HOME PAGE: http://www.iss.nthu.edu.tw

Chor-yiu (CY) Sin

National Tsing Hua University ( email )

Department of Economics
National Tsing Hua University
Hsinchu, Taiwan 30013
Taiwan
886-3-516-2134 (Phone)
886-3-562-9805 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
171
Abstract Views
864
rank
218,601
PlumX Metrics