An Asymptotically Tight Learning Algorithm for Mobile-Promotion Platforms

52 Pages Posted: 15 Feb 2020

See all articles by Zhichao Feng

Zhichao Feng

University of Texas at Dallas - Department of Information Systems & Operations Management

Milind Dawande

University of Texas at Dallas - Department of Information Systems & Operations Management

Ganesh Janakiraman

University of Texas at Dallas - Naveen Jindal School of Management

Anyan Qi

University of Texas at Dallas - Naveen Jindal School of Management

Date Written: January 21, 2020

Abstract

Operating under both supply-side and demand-side uncertainties, a mobile-promotion platform conducts advertising campaigns for individual advertisers. Campaigns arrive dynamically over time, which is divided into seasons; each campaign requires the platform to deliver a target number of mobile impressions from a desired set of locations over a desired time interval. The platform fulfills these campaigns by procuring impressions from publishers, who supply advertising space on apps, via real-time bidding on ad exchanges. Each location is characterized by its win curve, i.e., the relationship between the bid amount and the probability of winning an impression at that bid. The win curves at the various locations of interest are initially unknown to the platform, and it learns them on the fly based on the bids it places to win impressions and the realized outcomes. Each acquired impression is allocated to one of the ongoing campaigns. The platform's objective is to minimize its total cost (the amount spent in procuring impressions and the penalty incurred due to unmet targets of the campaigns) over the time horizon of interest. Our main result is a bidding and allocation policy for this problem. We show that our policy is the best possible (asymptotically tight) for the problem using the notion of regret under a policy, namely the difference between the expected total cost under that policy and the optimal cost for the clairvoyant problem (i.e., one in which the platform has full information about the win curves at all the locations in advance): The regret under any policy is Ω(√I), where I is the number of seasons, and that under our policy is O(√I).

Keywords: online advertising, learning, regret minimization, stochastic dynamic programming

Suggested Citation

Feng, Zhichao and Dawande, Milind and Janakiraman, Ganesh and Qi, Anyan, An Asymptotically Tight Learning Algorithm for Mobile-Promotion Platforms (January 21, 2020). Available at SSRN: https://ssrn.com/abstract=3523491 or http://dx.doi.org/10.2139/ssrn.3523491

Zhichao Feng

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Milind Dawande

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Ganesh Janakiraman

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Anyan Qi (Contact Author)

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
60
Abstract Views
440
rank
395,301
PlumX Metrics