Matrix-Factorization-Based Dimensionality Reduction in the Predictive Modeling Process: A Design Science Perspective

41 Pages Posted: 30 Sep 2016

See all articles by Jessica Clark

Jessica Clark

New York University, Department of Information, Operations, and Management Sciences, Students

Foster Provost

New York University

Date Written: September 2016

Abstract

Dimensionality Reduction (DR) is frequently employed in the predictive modeling process with the goal of improving the generalization performance of models. This paper takes a design science perspective on DR.We treat it as an important business analytics artifact and investigate its utility in the context of binary classification, with the goal of understanding its proper use and thus improving predictive modeling research and practice. Despite DR's popularity, we show that many published studies fail to undertake the necessary comparison to establish that it actually improves performance. We then conduct an experimental comparison between binary classification with and without matrix-factorization-based DR as a preprocessing step on the features. In particular, we investigate DR in the context of supervised complexity control. These experiments utilize three classifiers and three matrix-factorization based DR techniques, and measure performance on a total of 26 classification tasks. We find that DR is generally not beneficial for binary classification. Specifically, the more difficult the problem, the more DR is able to improve performance (but it diminishes easier problems' performance). However, this relationship depends on complexity control: DR's benefit is actually eliminated completely when state-of-the-art methods are used for complexity control. The wide variety of experimental conditions allows us to dig more deeply into when and why the different forms of complexity control are useful. We find that L2-regularized logistic regression models trained on the original feature set have the best performance in general. The relative benefit provided by DR is increased when using a classifier that incorporates feature selection; unfortunately, the performance of these models, even with DR, is lower in general. We compare three matrix-factorization-based DR algorithms and nd that none does better than using the full feature set, but of the three, SVD has the best performance. The results in this paper should be broadly useful for researchers and industry practitioners who work in applied data science. In particular, they emphasize the design science principle that adding design elements to the predictive modeling process should be done with attention to whether they add value.

Suggested Citation

Clark, Jessica and Provost, Foster, Matrix-Factorization-Based Dimensionality Reduction in the Predictive Modeling Process: A Design Science Perspective (September 2016). NYU Working Paper No.; CBA-16-01. Available at SSRN: https://ssrn.com/abstract=2845851

Jessica Clark

New York University, Department of Information, Operations, and Management Sciences, Students ( email )

New York, NY
United States

Foster Provost (Contact Author)

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
115
rank
226,870
Abstract Views
267
PlumX Metrics