Ensembling Experiments to Optimize Interventions along Customer Journey: A Reinforcement Learning Approach
Posted: 11 Oct 2021
Date Written: October 8, 2021
Randomized experiment (A/B testing) is the holy grail of causal inference and has been widely adopted by firms to evaluate various online interventions (the design of website, creative content, pricing, promotion). Most of such randomized experiments are designed with a goal to nail down the impact of one specific intervention in customer journey and get the clean causal effect. However, the literature on experiment and causal inference lacks a holistic approach to optimize a sequence of interventions along customer journey. Specifically, locally optimal interventions unveiled by the one-shot experiments might be globally sub-optimal when considering the interdependence among themselves as well as the long term reward along the customer journey. Luckily, the accumulation of large number of historical experiments creates and trails various exogenous interventions at different stages of customers' path-to-purchase and provides a new opportunity. In this paper, we integrate historical experiments with Reinforcement Learning (RL) algorithm to tackle the question that cannot be answered by standalone one-shot experiments: how to identify optimal sequence of interventions along customers' path-to-purchase using the ensemble of experiments? We proposed a Bayesian Deep Recurrent Q Network (BDRQN) model that can leverage the exogenous interventions within the historical experiment data to learn the effectiveness of interventions at different stages of customer journey and optimize them for the long-term reward. The Bayesian approach empowers the proposed model by not only identifying the long-term reward of various interventions but also estimating the distribution of those expected rewards. Thus, beyond optimization within the existing experiments and data, the BDRQN model framework and resulted estimation can also guide the allocation of future experiments along the customer journey to those high potential but uncertain interventions. In summary, the proposed RL+AB approach can create a two-way complementarity between RL and field experiments thus provide a holistic approach to optimize customer journey.
Keywords: Randomized experiments, Customer Journey, Reinforcement Learning, Optimization, Bayesian Deep Recurrent Q Network Model, Experiment Design
Suggested Citation: Suggested Citation