A Poisson Factorization Topic Model for the Study of Creative Documents (and Their Summaries)

62 Pages Posted: 5 Mar 2019

See all articles by Olivier Toubia

Olivier Toubia

Columbia Business School - Marketing

Date Written: February 13, 2019


We propose a topic model tailored to the study of creative documents (e.g., academic papers, movie scripts). We extend Poisson Factorization in two ways. First, the creativity literature emphasizes the importance of novelty in creative industries. Accordingly, we introduce a set of residual topics that capture the portion of each document that is not explained by a combination of common topics. Second, creative documents are typically accompanied by summaries (e.g., abstracts, synopses).\ Accordingly, we jointly model the content of creative documents and their summaries, and capture systematic variations in topic intensities between the documents and their summaries. We\ validate and illustrate the model in three domains:\ marketing academic papers, movie scripts, and TV show closed captions. We illustrate how the joint modeling of documents and summaries provides some insight into how humans summarize creative documents, and enhances our understanding of the significance of each topic. We show that our model produces new measures of novelty which can inform the perennial debate on the relation between novelty and success in creative industries. Finally, we show how the proposed model may form the basis for decision support tools that assist humans in writing summaries of creative documents.

Keywords: Topic Models, Natural Language Processing, Creativity

JEL Classification: M31, C02

Suggested Citation

Toubia, Olivier, A Poisson Factorization Topic Model for the Study of Creative Documents (and Their Summaries) (February 13, 2019). Available at SSRN: https://ssrn.com/abstract=3334028 or http://dx.doi.org/10.2139/ssrn.3334028

Olivier Toubia (Contact Author)

Columbia Business School - Marketing ( email )

New York, NY 10027
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics