Causal Bandits: Online Decision-Making in Endogenous Settings

52 Pages Posted: 22 Nov 2022 Last revised: 10 May 2024

See all articles by Jingwen Zhang

Jingwen Zhang

University of Washington

Yifang Chen

University of Washington - Paul G. Allen School of Computer Science & Engineering

Amandeep Singh

University of Washington - Michael G. Foster School of Business; University of Pennsylvania - The Wharton School

Date Written: November 16, 2022

Abstract

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many marketing applications. However, regret guarantees for even state-of-the-art linear bandit algorithms make strong exogeneity assumptions w.r.t. arm covariates, i.e., the covariates are uncorrelated with the unobserved random error. This assumption is very often violated in many practical contexts, and using such algorithms can lead to sub-optimal decisions. Further, many a time, marketers are also interested in the asymptotic distribution of estimated parameters. To this end, in this paper, we consider the problem of online learning in linear stochastic contextual bandit problems with endogenous covariates. We propose an algorithm we term $\epsilon$-\textit{BanditIV}, that uses instrumental variables to correct for this bias and prove an $\tilde{\mathcal{O}}(k\sqrt{T})$ upper bound for the expected regret of the algorithm, where $k$ is the dimension of the instrumental variable and $T$ is the number of rounds in the algorithm. Further, we demonstrate the asymptotic consistency and normality of the $\epsilon$-\textit{BanditIV} estimator. We carry out extensive Monte Carlo simulations to demonstrate the performance of our algorithms compared to other methods. We show that $\epsilon$-\textit{BanditIV} significantly outperforms other existing methods in endogenous settings. Finally, using daily paid app download data from iOS and Real-Time Bidding (RTB) data, we demonstrate how $\epsilon$-\textit{BanditIV} can be used to simultaneously optimize online decision-making and estimate the causal impact of price and advertising, respectively, in these settings. Comparisons show $\epsilon$-\textit{BanditIV} performs favorably against other methods.

Keywords: Multi-Armed Bandits, Causal Inference, Online Learning, Instrumental Variables

Suggested Citation

Zhang, Jingwen and Chen, Yifang and Singh, Amandeep, Causal Bandits: Online Decision-Making in Endogenous Settings (November 16, 2022). Available at SSRN: https://ssrn.com/abstract=4278162 or http://dx.doi.org/10.2139/ssrn.4278162

Jingwen Zhang

University of Washington ( email )

Box 353200
Seattle, WA 98195-3200
United States

Yifang Chen

University of Washington - Paul G. Allen School of Computer Science & Engineering

Amandeep Singh (Contact Author)

University of Washington - Michael G. Foster School of Business ( email )

Box 353200
Seattle, WA 98195-3200
United States

University of Pennsylvania - The Wharton School ( email )

3641 Locust Walk
Philadelphia, PA 19104-6365
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
160
Abstract Views
977
Rank
356,358
PlumX Metrics