Forecasting Sales of New and Existing Products Using Consumer Reviews: A Random Projections Approach
40 Pages Posted: 27 Aug 2015
Date Written: August 25, 2015
We consider the problem of predicting sales of new and existing products using both numeric and textual data contained in consumer reviews. Many extant approaches require considerable manual pre-processing of the textual data, making the methods prohibitively expensive to implement and difficult to scale. By contrast, our approach using a bag-of-words method requires minimal pre-processing and parsing, making it efficient and scalable. However, a key implementation challenge in the bag-of-words approach is that the number of predictors can quickly outstrip available degrees of freedom. Further, the method can require impracticably large computational resources. We propose a random projections approach to deal with the curse-of-dimensionality that afflicts bag-of-words models. The random projections approach is computationally simple, flexible and fast, and has desirable statistical properties. We apply the proposed approach to forecast sales at Amazon.com using consumer reviews with an attributes-based regression model. The model is applied in producing one-week-ahead rolling horizon sales forecasts for existing and newly introduced tablet computers. Results show that in both tasks the predictive performance of the proposed approach is strong and significantly better than that of models that ignore the textual content of consumer reviews, and a support vector regression machine with the textual content. Further, the approach is easily repeatable across product categories, and readily scalable to much larger datasets.
Keywords: Big data, forecasting, consumer reviews, textual data, random projections
Suggested Citation: Suggested Citation