Estimating Audience Interest Distribution Based on Audience Web Behavior
Posted: 21 Feb 2013
Date Written: June 2013
The increasing availability of massive data on users' online behavior presents exciting opportunities for business analytics. In particular, if we could model the distributions of interests of visitors to webpages (or websites), we could apply the result to applications including site optimization, advertisement targeting, content creation, internal offer merchandizing, sponsorship, and general customer analytics. We first present a two-stage generative model for estimating audience interest distributions (AID) for websites, based in part on estimating individual (anonymized) user interest distributions from their observed visitation patterns to labeled websites. The model yields the following interpretation: the AID is the expected interest distribution of a visitor to the website. Estimating AID is important for several reasons: (i) contextual categorization of websites is expensive and/or error prone at large scale, (ii) even under favorable assumptions, contextual categorization provides only a narrow view of user interests, and (iii) certain sorts of sites (image, video, social) do not lend themselves to easy/accurate contextual categorization. The paper then demonstrates and evaluates the model on a massive set of (anonymized) data from a large online advertising company. We show two main findings. (1) In a predictive modeling-style evaluation, for sites where user interests are (partially) known, the model predicts them well. (2) For pages where con- textual categorization does not estimate user interests well (specifically, image pages), the model does estimate them well. We also provide qualitative results demonstrating how the model can reveal interests that are not apparent from contextual categorization.
Suggested Citation: Suggested Citation