'Big Data' on the Big Screen: Revealing Latent Similarity among Movies and Its Effect on Box Office Performance

38 Pages Posted: 4 Oct 2016  

Sandra Barbosu

Rotman School of Management, University of Toronto

Date Written: October 2, 2016


In the movie industry context, this paper seeks to determine whether the number of categories a product belongs to is an explanatory factor of performance. The prevailing, but unproven hypothesis in current research is that spanning multiple categories is associated with worse performance. The intuition is that consumers are more likely to ignore products that do not have a clear fit with existing market positions. However, no studies have provided evidence for the causality of the correlation between category-spanning and performance. Importantly, there exist at least two major reasonable alternative causal mechanisms that could account for the relationship, which prevents us from interpreting the relationship as causal. Moreover, current research views the way consumers categorize products as largely unobservable. Instead, for tractability, it relies on category labels assigned to products by market intermediaries as a proxy for how consumers would categorize products. Recent studies have questioned whether this approach truly reflects the way individual consumers categorize. This paper seeks to shed light on previously unobservable consumer categorization through visible online consumer demand patterns, and address the question of which measure of category-spanning is better able to explain performance: a traditional movie’s genre count, or a new measure based on consumer rental patterns? The widespread digitization of consumer data, enabled by recent technological changes, offers an opportunity to observe consumer movie preferences through their online movie rental behavior. From these data, we can get closer to capturing the way consumers categorize movies. Specifically, I use Amazon Instant Video’s “Customers who rent this [focal movie] also rented...” co-rental lists to uncover latent relationships between movies from revealed rental patterns. Then, I construct a measure of category-spanning, called Latent Similarity, to quantify a movie’s similarity to other movies based on their shared co-rentals. My empirical approach seeks to provide supportive evidence that category-spanning, as measured by Latent Similarity, is an explanatory factor of performance, as measured by box office revenues. In this context, it is difficult to show causality directly. This would entail changing a product’s categories while leaving all its other characteristics the same. Instead, I attempt to provide indirect evidence that category-spanning affects performance by refuting two of the most plausible alternative causal mechanisms (reverse causality, and unobservable quality as a common cause). I also show that as a measure of category-spanning, Latent Similarity is more explanatory of performance than the traditional movie genre count. Methodologically, the paper employs a combination of rigorous econometric methods, including instrumental variable and control function approaches, to provide support for the causality of the relationship between category-spanning and performance.

Keywords: Movies, Categories, Big Data

Suggested Citation

Barbosu, Sandra, 'Big Data' on the Big Screen: Revealing Latent Similarity among Movies and Its Effect on Box Office Performance (October 2, 2016). Available at SSRN: https://ssrn.com/abstract=2846821

Sandra Barbosu (Contact Author)

Rotman School of Management, University of Toronto ( email )

Toronto, ON

HOME PAGE: http://sandra.barbosu.com

Paper statistics

Abstract Views