Estimating Audience Interest Distribution Based on Audience Web Behavior

Posted: 21 Feb 2013

See all articles by Xiaohan Zhang

Xiaohan Zhang

New York University (NYU) - Department of Information, Operations, and Management Sciences

Foster Provost

New York University

Tsemekhman Kiril

affiliation not provided to SSRN

Date Written: June 2013

Abstract

The increasing availability of massive data on users' online behavior presents exciting opportunities for business analytics. In particular, if we could model the distributions of interests of visitors to webpages (or websites), we could apply the result to applications including site optimization, advertisement targeting, content creation, internal offer merchandizing, sponsorship, and general customer analytics. We first present a two-stage generative model for estimating audience interest distributions (AID) for websites, based in part on estimating individual (anonymized) user interest distributions from their observed visitation patterns to labeled websites. The model yields the following interpretation: the AID is the expected interest distribution of a visitor to the website. Estimating AID is important for several reasons: (i) contextual categorization of websites is expensive and/or error prone at large scale, (ii) even under favorable assumptions, contextual categorization provides only a narrow view of user interests, and (iii) certain sorts of sites (image, video, social) do not lend themselves to easy/accurate contextual categorization. The paper then demonstrates and evaluates the model on a massive set of (anonymized) data from a large online advertising company. We show two main findings. (1) In a predictive modeling-style evaluation, for sites where user interests are (partially) known, the model predicts them well. (2) For pages where con- textual categorization does not estimate user interests well (specifically, image pages), the model does estimate them well. We also provide qualitative results demonstrating how the model can reveal interests that are not apparent from contextual categorization.

Suggested Citation

Zhang, Xiaohan and Provost, Foster and Kiril, Tsemekhman, Estimating Audience Interest Distribution Based on Audience Web Behavior (June 2013). NYU Working Paper No. 2451/31830, Available at SSRN: https://ssrn.com/abstract=2221762

Xiaohan Zhang (Contact Author)

New York University (NYU) - Department of Information, Operations, and Management Sciences ( email )

44 West Fourth Street
New York, NY 10012
United States
2129980390 (Phone)

Foster Provost

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States

Tsemekhman Kiril

affiliation not provided to SSRN

No Address Available

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
766
PlumX Metrics