The Summarization of Creative Content

64 Pages Posted: 21 Aug 2017

See all articles by Olivier Toubia

Olivier Toubia

Columbia Business School - Marketing

Date Written: August 16, 2017

Abstract

We study and model the process by which humans summarize creative documents (e.g., from a movie script to a synopsis). We develop a customized topic model based on Poisson Factorization and inspired by the creativity literature, which links the text in a summary to the text in the original document. Traditional Poisson Factorization approximates documents as positive combinations of topics, i.e., as points in the cone defined by a set of topics (in the Euclidean space defined by the words in the vocabulary). The model proposed here captures not only this “inside the cone” portion of a document, but also the “outside the cone” portion that is not explained by a combination of common topics. The model captures how these two types of content are weighed in summaries as compared to full documents. In addition, it captures writing norms that influence the extent to which each topic appears in summaries compared to full documents. We apply this model to a dataset of marketing academic papers and their abstracts, and to a dataset of movie scripts and their synopses. We illustrate a practical application of our research by creating a public, online interactive tool meant to serve as a “sounding board” for users interested in writing summaries of creative documents.

Keywords: Bayesian Estimation, Innovations, Machine Learning, Creativity, Topic Models

Suggested Citation

Toubia, Olivier, The Summarization of Creative Content (August 16, 2017). Columbia Business School Research Paper No. 17-86. Available at SSRN: https://ssrn.com/abstract=3020131 or http://dx.doi.org/10.2139/ssrn.3020131

Olivier Toubia (Contact Author)

Columbia Business School - Marketing ( email )

New York, NY 10027
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
131
Abstract Views
661
rank
226,176
PlumX Metrics