Dimensionality Reduction Via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data

38 Pages Posted: 28 May 2015

See all articles by Jessica Clark

Jessica Clark

New York University (NYU) - Department of Information, Operations, and Management Sciences

Foster Provost

New York University

Date Written: May 2015

Abstract

Matrix factorization is a popular technique for engineering features for use in predictive models; it is viewed as a key part of the predictive analytics process and is used in many different domain areas. The purpose of this paper is to investigate matrix-factorization-based dimensionality reduction as a design artifact in predictive analytics. With the rise in availability of large amounts of sparse behavioral data, this investigation comes at a time when traditional techniques must be reevaluated. Our contribution is based on two lines of inquiry: we survey the literature on dimensionality reduction in predictive analytics, and we undertake an experimental evaluation comparing using dimensionality reduction versus not using dimensionality reduction for predictive modeling from large, sparse behavioral data. Our survey of the dimensionality reduction literature reveals that, despite mixed empirical evidence as to the benefit of computing dimensionality reduction, it is frequently applied in predictive modeling research and application without either comparing to a model built using the full feature set or utilizing state-of-the-art predictive modeling techniques for complexity control. This presents a concern, as the survey reveals complexity control as one of the main reasons for employing dimensionality reduction. This lack of comparison is troubling in light of our empirical results. We experimentally evaluate the e cacy of dimensionality reduction in the context of a collection of predictive modeling problems from a large-scale published study. We find that utilizing dimensionality reduction improves predictive performance only under certain, rather narrow, conditions. Specifically, under default regularization (complexity control)settings dimensionality reduction helps for the more di cult predictive problems (where the predictive performance of a model built using the original feature set is relatively lower), but it actually decreases the performance on the easier problems. More surprisingly, employing state-of-the-art methods for selecting regularization parameters actually eliminates any advantage that dimensionality reduction has! Since the value of building accurate predictive models for business analytics applications has been well-established, the resulting guidelines for the application of dimensionality reduction should lead to better research and managerial decisions.

Suggested Citation

Clark, Jessica and Provost, Foster, Dimensionality Reduction Via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data (May 2015). NYU Working Paper No. 2451/33970. Available at SSRN: https://ssrn.com/abstract=2611543

Jessica Clark (Contact Author)

New York University (NYU) - Department of Information, Operations, and Management Sciences

44 West Fourth Street
New York, NY 10012
United States

Foster Provost

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
222
Abstract Views
951
rank
140,805
PlumX Metrics