Learning Optimal Online Advertising Portfolios with Periodic Budgets
41 Pages Posted: 27 Mar 2019
Date Written: February 17, 2019
Online advertising enables advertisers to reach customers with personalized ads. Advertisers need to determine the right targets for their ads and how much they are willing to pay to engage those targets. A large portion of online ads are priced using real-time auctions, thus advertisers need to decide which targets to bid on in these auctions. Collaborating with one of the largest ad-tech firms in the world, we develop new algorithms that help advertisers bid optimally on target portfolios while taking into account some limitations inherent to online advertising. We study this problem as a Multi-Armed Bandit (MAB) problem with periodic budgets. At the beginning of each time period, the advertiser needs to determine which portfolio of target to select to maximize the expected total revenue (revenue from clicks/conversions), while maintaining the total cost of auction payments within the advertising budget. In this paper, we formulate the problem and develop an Optimistic-Robust Learning (ORL) algorithm that uses ideas from Upper Confidence Bound (UCB) algorithms and robust optimization. We prove that the expected cumulative regret of the algorithm is bounded. Additionally, simulations on synthetic and real-world data show that the ORL algorithm reduces regret by at least 10-20% compared to benchmarks.
Keywords: Online Advertising, Online Learning, Multi-Armed Bandits, Upper Confidence Bound Algorithm
Suggested Citation: Suggested Citation